-- Logs begin at Tue 2019-06-18 12:09:07 PDT, end at Wed 2019-08-28 11:17:08 PDT. -- Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys cpuset Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys cpu Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys cpuacct Jun 18 12:09:07 fir-md1-s1 kernel: Linux version 3.10.0-957.1.3.el7_lustre.x86_64 (sthiell@fir-io1-s1) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 18 12:09:07 fir-md1-s1 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.1.3.el7_lustre.x86_64 root=UUID=4adf0488-f60f-46c3-a712-956aaee5c4b2 ro crashkernel=auto nomodeset console=ttyS0,115200 LANG=en_US.UTF-8 Jun 18 12:09:07 fir-md1-s1 kernel: e820: BIOS-provided physical RAM map: Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000000000090000-0x000000000009ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000000000100000-0x000000005c3dffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000005c3e0000-0x00000000643e7fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x00000000643e8000-0x000000006cacefff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000006cacf000-0x000000006efcefff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000006efcf000-0x000000006fdfefff] ACPI NVS Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000006fdff000-0x000000006fffefff] ACPI data Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000006ffff000-0x000000006fffffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000000070000000-0x000000008fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000107f37ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000107f380000-0x000000107fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000001080000000-0x000000207ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000207ff80000-0x000000207fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000002080000000-0x000000307ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000307ff80000-0x000000307fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x0000003080000000-0x000000407ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: BIOS-e820: [mem 0x000000407ff80000-0x000000407fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: NX (Execute Disable) protection: active Jun 18 12:09:07 fir-md1-s1 kernel: e820: update [mem 0x446da020-0x4470b25f] usable ==> usable Jun 18 12:09:07 fir-md1-s1 kernel: e820: update [mem 0x446a8020-0x446d925f] usable ==> usable Jun 18 12:09:07 fir-md1-s1 kernel: e820: update [mem 0x5b485020-0x5b48d05f] usable ==> usable Jun 18 12:09:07 fir-md1-s1 kernel: e820: update [mem 0x4468f020-0x446a765f] usable ==> usable Jun 18 12:09:07 fir-md1-s1 kernel: extended physical RAM map: Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000000000000000-0x000000000008efff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000000000090000-0x000000000009ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000000000100000-0x000000004468f01f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000004468f020-0x00000000446a765f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000446a7660-0x00000000446a801f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000446a8020-0x00000000446d925f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000446d9260-0x00000000446da01f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000446da020-0x000000004470b25f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000004470b260-0x000000005b48501f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000005b485020-0x000000005b48d05f] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000005b48d060-0x000000005c3dffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000005c3e0000-0x00000000643e7fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000643e8000-0x000000006cacefff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000006cacf000-0x000000006efcefff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000006efcf000-0x000000006fdfefff] ACPI NVS Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000006fdff000-0x000000006fffefff] ACPI data Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000006ffff000-0x000000006fffffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000000070000000-0x000000008fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000fec10000-0x00000000fec10fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x00000000fed80000-0x00000000fed80fff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000000100000000-0x000000107f37ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000107f380000-0x000000107fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000001080000000-0x000000207ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000207ff80000-0x000000207fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000002080000000-0x000000307ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000307ff80000-0x000000307fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x0000003080000000-0x000000407ff7ffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: reserve setup_data: [mem 0x000000407ff80000-0x000000407fffffff] reserved Jun 18 12:09:07 fir-md1-s1 kernel: efi: EFI v2.50 by Dell Inc. Jun 18 12:09:07 fir-md1-s1 kernel: efi: ACPI=0x6fffe000 ACPI 2.0=0x6fffe014 SMBIOS=0x6eab5000 SMBIOS 3.0=0x6eab3000 Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem00: type=3, attr=0xf, range=[0x0000000000000000-0x0000000000001000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem01: type=2, attr=0xf, range=[0x0000000000001000-0x0000000000002000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem02: type=7, attr=0xf, range=[0x0000000000002000-0x0000000000010000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem03: type=3, attr=0xf, range=[0x0000000000010000-0x0000000000014000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem04: type=7, attr=0xf, range=[0x0000000000014000-0x0000000000063000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem05: type=3, attr=0xf, range=[0x0000000000063000-0x000000000008f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem06: type=10, attr=0xf, range=[0x000000000008f000-0x0000000000090000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem07: type=3, attr=0xf, range=[0x0000000000090000-0x00000000000a0000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem08: type=4, attr=0xf, range=[0x0000000000100000-0x0000000000120000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem09: type=7, attr=0xf, range=[0x0000000000120000-0x0000000000c00000) (10MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem10: type=3, attr=0xf, range=[0x0000000000c00000-0x0000000001000000) (4MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem11: type=2, attr=0xf, range=[0x0000000001000000-0x000000000267a000) (22MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem12: type=7, attr=0xf, range=[0x000000000267a000-0x0000000004000000) (25MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem13: type=4, attr=0xf, range=[0x0000000004000000-0x000000000403b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem14: type=7, attr=0xf, range=[0x000000000403b000-0x000000003ecab000) (940MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem15: type=2, attr=0xf, range=[0x000000003ecab000-0x0000000040000000) (19MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem16: type=7, attr=0xf, range=[0x0000000040000000-0x000000004468f000) (70MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem17: type=2, attr=0xf, range=[0x000000004468f000-0x000000005b25c000) (363MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem18: type=1, attr=0xf, range=[0x000000005b25c000-0x000000005b475000) (2MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem19: type=7, attr=0xf, range=[0x000000005b475000-0x000000005b485000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem20: type=2, attr=0xf, range=[0x000000005b485000-0x000000005b48e000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem21: type=4, attr=0xf, range=[0x000000005b48e000-0x000000005b491000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem22: type=2, attr=0xf, range=[0x000000005b491000-0x000000005b59c000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem23: type=4, attr=0xf, range=[0x000000005b59c000-0x000000005b6bf000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem24: type=3, attr=0xf, range=[0x000000005b6bf000-0x000000005b70d000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem25: type=4, attr=0xf, range=[0x000000005b70d000-0x000000005b75b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem26: type=3, attr=0xf, range=[0x000000005b75b000-0x000000005b7bd000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem27: type=4, attr=0xf, range=[0x000000005b7bd000-0x000000005b7c7000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem28: type=3, attr=0xf, range=[0x000000005b7c7000-0x000000005b8b2000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem29: type=4, attr=0xf, range=[0x000000005b8b2000-0x000000005b8c1000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem30: type=7, attr=0xf, range=[0x000000005b8c1000-0x000000005b8c7000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem31: type=4, attr=0xf, range=[0x000000005b8c7000-0x000000005b8cc000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem32: type=3, attr=0xf, range=[0x000000005b8cc000-0x000000005b927000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem33: type=4, attr=0xf, range=[0x000000005b927000-0x000000005b931000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem34: type=3, attr=0xf, range=[0x000000005b931000-0x000000005b960000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem35: type=4, attr=0xf, range=[0x000000005b960000-0x000000005bc2f000) (2MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem36: type=3, attr=0xf, range=[0x000000005bc2f000-0x000000005bc3c000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem37: type=4, attr=0xf, range=[0x000000005bc3c000-0x000000005be3a000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem38: type=7, attr=0xf, range=[0x000000005be3a000-0x000000005be3b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem39: type=4, attr=0xf, range=[0x000000005be3b000-0x000000005be4f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem40: type=7, attr=0xf, range=[0x000000005be4f000-0x000000005be50000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem41: type=4, attr=0xf, range=[0x000000005be50000-0x000000005be5a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem42: type=7, attr=0xf, range=[0x000000005be5a000-0x000000005be5b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem43: type=4, attr=0xf, range=[0x000000005be5b000-0x000000005be61000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem44: type=2, attr=0xf, range=[0x000000005be61000-0x000000005be63000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem45: type=4, attr=0xf, range=[0x000000005be63000-0x000000005be64000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem46: type=7, attr=0xf, range=[0x000000005be64000-0x000000005be65000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem47: type=4, attr=0xf, range=[0x000000005be65000-0x000000005be77000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem48: type=7, attr=0xf, range=[0x000000005be77000-0x000000005be78000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem49: type=4, attr=0xf, range=[0x000000005be78000-0x000000005be79000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem50: type=7, attr=0xf, range=[0x000000005be79000-0x000000005be7a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem51: type=2, attr=0xf, range=[0x000000005be7a000-0x000000005be7b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem52: type=4, attr=0xf, range=[0x000000005be7b000-0x000000005be7f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem53: type=3, attr=0xf, range=[0x000000005be7f000-0x000000005bea2000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem54: type=4, attr=0xf, range=[0x000000005bea2000-0x000000005c003000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem55: type=3, attr=0xf, range=[0x000000005c003000-0x000000005c3e0000) (3MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem56: type=0, attr=0xf, range=[0x000000005c3e0000-0x00000000643e8000) (128MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem57: type=3, attr=0xf, range=[0x00000000643e8000-0x0000000064fae000) (11MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem58: type=4, attr=0xf, range=[0x0000000064fae000-0x0000000068acf000) (59MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem59: type=3, attr=0xf, range=[0x0000000068acf000-0x0000000068ecf000) (4MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem60: type=7, attr=0xf, range=[0x0000000068ecf000-0x0000000068ed1000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem61: type=4, attr=0xf, range=[0x0000000068ed1000-0x0000000069066000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem62: type=7, attr=0xf, range=[0x0000000069066000-0x0000000069067000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem63: type=4, attr=0xf, range=[0x0000000069067000-0x000000006907a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem64: type=7, attr=0xf, range=[0x000000006907a000-0x000000006907b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem65: type=4, attr=0xf, range=[0x000000006907b000-0x0000000069087000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem66: type=7, attr=0xf, range=[0x0000000069087000-0x0000000069088000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem67: type=4, attr=0xf, range=[0x0000000069088000-0x0000000069089000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem68: type=7, attr=0xf, range=[0x0000000069089000-0x000000006908a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem69: type=4, attr=0xf, range=[0x000000006908a000-0x0000000069097000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem70: type=7, attr=0xf, range=[0x0000000069097000-0x0000000069098000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem71: type=4, attr=0xf, range=[0x0000000069098000-0x000000006909b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem72: type=7, attr=0xf, range=[0x000000006909b000-0x000000006909c000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem73: type=4, attr=0xf, range=[0x000000006909c000-0x00000000690dc000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem74: type=7, attr=0xf, range=[0x00000000690dc000-0x00000000690dd000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem75: type=4, attr=0xf, range=[0x00000000690dd000-0x000000006911a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem76: type=7, attr=0xf, range=[0x000000006911a000-0x000000006911b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem77: type=4, attr=0xf, range=[0x000000006911b000-0x000000006911f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem78: type=7, attr=0xf, range=[0x000000006911f000-0x0000000069120000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem79: type=4, attr=0xf, range=[0x0000000069120000-0x000000006914c000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem80: type=7, attr=0xf, range=[0x000000006914c000-0x000000006914d000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem81: type=4, attr=0xf, range=[0x000000006914d000-0x0000000069152000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem82: type=7, attr=0xf, range=[0x0000000069152000-0x0000000069153000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem83: type=4, attr=0xf, range=[0x0000000069153000-0x0000000069164000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem84: type=7, attr=0xf, range=[0x0000000069164000-0x0000000069165000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem85: type=4, attr=0xf, range=[0x0000000069165000-0x000000006917a000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem86: type=7, attr=0xf, range=[0x000000006917a000-0x000000006917b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem87: type=4, attr=0xf, range=[0x000000006917b000-0x000000006918b000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem88: type=7, attr=0xf, range=[0x000000006918b000-0x000000006918c000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem89: type=4, attr=0xf, range=[0x000000006918c000-0x00000000691ea000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem90: type=7, attr=0xf, range=[0x00000000691ea000-0x00000000691eb000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem91: type=4, attr=0xf, range=[0x00000000691eb000-0x00000000691ff000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem92: type=7, attr=0xf, range=[0x00000000691ff000-0x0000000069200000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem93: type=4, attr=0xf, range=[0x0000000069200000-0x0000000069204000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem94: type=7, attr=0xf, range=[0x0000000069204000-0x0000000069205000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem95: type=4, attr=0xf, range=[0x0000000069205000-0x000000006920e000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem96: type=7, attr=0xf, range=[0x000000006920e000-0x000000006920f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem97: type=4, attr=0xf, range=[0x000000006920f000-0x0000000069216000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem98: type=7, attr=0xf, range=[0x0000000069216000-0x0000000069217000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem99: type=4, attr=0xf, range=[0x0000000069217000-0x0000000069218000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem100: type=7, attr=0xf, range=[0x0000000069218000-0x0000000069219000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem101: type=4, attr=0xf, range=[0x0000000069219000-0x000000006921c000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem102: type=7, attr=0xf, range=[0x000000006921c000-0x000000006921e000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem103: type=4, attr=0xf, range=[0x000000006921e000-0x0000000069223000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem104: type=7, attr=0xf, range=[0x0000000069223000-0x0000000069224000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem105: type=4, attr=0xf, range=[0x0000000069224000-0x0000000069226000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem106: type=7, attr=0xf, range=[0x0000000069226000-0x0000000069227000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem107: type=4, attr=0xf, range=[0x0000000069227000-0x000000006922f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem108: type=7, attr=0xf, range=[0x000000006922f000-0x0000000069230000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem109: type=4, attr=0xf, range=[0x0000000069230000-0x000000006924f000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem110: type=7, attr=0xf, range=[0x000000006924f000-0x0000000069250000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem111: type=4, attr=0xf, range=[0x0000000069250000-0x000000006a2d3000) (16MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem112: type=7, attr=0xf, range=[0x000000006a2d3000-0x000000006a2d5000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem113: type=4, attr=0xf, range=[0x000000006a2d5000-0x000000006c3cf000) (32MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem114: type=7, attr=0xf, range=[0x000000006c3cf000-0x000000006c3d1000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem115: type=3, attr=0xf, range=[0x000000006c3d1000-0x000000006cacf000) (6MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem116: type=6, attr=0x800000000000000f, range=[0x000000006cacf000-0x000000006cbcf000) (1MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem117: type=5, attr=0x800000000000000f, range=[0x000000006cbcf000-0x000000006cdcf000) (2MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem118: type=0, attr=0xf, range=[0x000000006cdcf000-0x000000006efcf000) (34MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem119: type=10, attr=0xf, range=[0x000000006efcf000-0x000000006fdff000) (14MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem120: type=9, attr=0xf, range=[0x000000006fdff000-0x000000006ffff000) (2MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem121: type=4, attr=0xf, range=[0x000000006ffff000-0x0000000070000000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem122: type=7, attr=0xf, range=[0x0000000100000000-0x000000107f380000) (63475MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem123: type=7, attr=0xf, range=[0x0000001080000000-0x000000207ff80000) (65535MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem124: type=7, attr=0xf, range=[0x0000002080000000-0x000000307ff80000) (65535MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem125: type=7, attr=0xf, range=[0x0000003080000000-0x000000407ff80000) (65535MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem126: type=0, attr=0x9, range=[0x0000000070000000-0x0000000080000000) (256MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem127: type=11, attr=0x800000000000000f, range=[0x0000000080000000-0x0000000090000000) (256MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem128: type=11, attr=0x800000000000000f, range=[0x00000000fec10000-0x00000000fec11000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem129: type=11, attr=0x800000000000000f, range=[0x00000000fed80000-0x00000000fed81000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem130: type=0, attr=0x0, range=[0x000000107f380000-0x0000001080000000) (12MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem131: type=0, attr=0x0, range=[0x000000207ff80000-0x0000002080000000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem132: type=0, attr=0x0, range=[0x000000307ff80000-0x0000003080000000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: efi: mem133: type=0, attr=0x0, range=[0x000000407ff80000-0x0000004080000000) (0MB) Jun 18 12:09:07 fir-md1-s1 kernel: SMBIOS 3.0.0 present. Jun 18 12:09:07 fir-md1-s1 kernel: DMI: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 Jun 18 12:09:07 fir-md1-s1 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Jun 18 12:09:07 fir-md1-s1 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable Jun 18 12:09:07 fir-md1-s1 kernel: e820: last_pfn = 0x407ff80 max_arch_pfn = 0x400000000 Jun 18 12:09:07 fir-md1-s1 kernel: MTRR default type: uncachable Jun 18 12:09:07 fir-md1-s1 kernel: MTRR fixed ranges enabled: Jun 18 12:09:07 fir-md1-s1 kernel: 00000-9FFFF write-back Jun 18 12:09:07 fir-md1-s1 kernel: A0000-FFFFF uncachable Jun 18 12:09:07 fir-md1-s1 kernel: MTRR variable ranges enabled: Jun 18 12:09:07 fir-md1-s1 kernel: 0 base 0000FF000000 mask FFFFFF000000 write-protect Jun 18 12:09:07 fir-md1-s1 kernel: 1 base 000000000000 mask FFFF80000000 write-back Jun 18 12:09:07 fir-md1-s1 kernel: 2 base 000070000000 mask FFFFF0000000 uncachable Jun 18 12:09:07 fir-md1-s1 kernel: 3 disabled Jun 18 12:09:07 fir-md1-s1 kernel: 4 disabled Jun 18 12:09:07 fir-md1-s1 kernel: 5 disabled Jun 18 12:09:07 fir-md1-s1 kernel: 6 disabled Jun 18 12:09:07 fir-md1-s1 kernel: 7 disabled Jun 18 12:09:07 fir-md1-s1 kernel: TOM2: 0000004080000000 aka 264192M Jun 18 12:09:07 fir-md1-s1 kernel: PAT configuration [0-7]: WB WC UC- UC WB WP UC- UC Jun 18 12:09:07 fir-md1-s1 kernel: e820: last_pfn = 0x70000 max_arch_pfn = 0x400000000 Jun 18 12:09:07 fir-md1-s1 kernel: Base memory trampoline at [ffff8f0500099000] 99000 size 24576 Jun 18 12:09:07 fir-md1-s1 kernel: Using GB pages for direct mapping Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc52000, 0x2b0fc52fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc53000, 0x2b0fc53fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc54000, 0x2b0fc54fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc55000, 0x2b0fc55fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc56000, 0x2b0fc56fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc57000, 0x2b0fc57fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc58000, 0x2b0fc58fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc59000, 0x2b0fc59fff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc5a000, 0x2b0fc5afff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc5b000, 0x2b0fc5bfff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc5c000, 0x2b0fc5cfff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: BRK [0x2b0fc5d000, 0x2b0fc5dfff] PGTABLE Jun 18 12:09:07 fir-md1-s1 kernel: RAMDISK: [mem 0x3ecab000-0x3fffdfff] Jun 18 12:09:07 fir-md1-s1 kernel: Early table checksum verification disabled Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: RSDP 000000006fffe014 00024 (v02 DELL ) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: XSDT 000000006fffd0e8 000B4 (v01 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: FACP 000000006ffef000 00114 (v06 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: DSDT 000000006ffe2000 0950B (v02 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: FACS 000000006fdd4000 00040 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SSDT 000000006fffc000 000D2 (v02 DELL PE_SC3 00000002 MSFT 04000000) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: BERT 000000006fffb000 00030 (v01 DELL BERT 00000001 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: HEST 000000006fffa000 006DC (v01 DELL HEST 00000001 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SSDT 000000006fff9000 00294 (v01 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SRAT 000000006fff8000 00420 (v03 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: MSCT 000000006fff7000 0004E (v01 DELL PE_SC3 00000000 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SLIT 000000006fff6000 0003C (v01 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: CRAT 000000006fff3000 02DC0 (v01 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: CDIT 000000006fff2000 00038 (v01 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: EINJ 000000006fff1000 00150 (v01 DELL PE_SC3 00000001 AMD 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SLIC 000000006fff0000 00024 (v01 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: HPET 000000006ffee000 00038 (v01 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: APIC 000000006ffed000 004B2 (v03 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: MCFG 000000006ffec000 0003C (v01 DELL PE_SC3 00000002 DELL 00000001) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SSDT 000000006ffe1000 00629 (v02 DELL xhc_port 00000001 INTL 20170119) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IVRS 000000006ffe0000 00210 (v02 DELL PE_SC3 00000001 AMD 00000000) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: SSDT 000000006ffde000 01658 (v01 AMD CPMCMN 00000001 INTL 20170119) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Local APIC address 0xfee00000 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x01 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x02 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x03 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x04 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x05 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x08 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x09 -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x0a -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x0b -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x0c -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 0 -> APIC 0x0d -> Node 0 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x10 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x11 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x12 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x13 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x14 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x15 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x18 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x19 -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x1a -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x1b -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x1c -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 1 -> APIC 0x1d -> Node 1 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x20 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x21 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x22 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x23 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x24 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x25 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x28 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x29 -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x2a -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x2b -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x2c -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 2 -> APIC 0x2d -> Node 2 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x30 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x31 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x32 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x33 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x34 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x35 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x38 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x39 -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x3a -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x3b -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x3c -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: PXM 3 -> APIC 0x3d -> Node 3 Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 0 PXM 0 [mem 0x100000000-0x107fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 1 PXM 1 [mem 0x1080000000-0x207fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 2 PXM 2 [mem 0x2080000000-0x307fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: SRAT: Node 3 PXM 3 [mem 0x3080000000-0x407fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: NUMA: Initialized distance table, cnt=4 Jun 18 12:09:07 fir-md1-s1 kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x7fffffff] -> [mem 0x00000000-0x7fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x107fffffff] -> [mem 0x00000000-0x107fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: NODE_DATA(0) allocated [mem 0x107f359000-0x107f37ffff] Jun 18 12:09:07 fir-md1-s1 kernel: NODE_DATA(1) allocated [mem 0x207ff59000-0x207ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: NODE_DATA(2) allocated [mem 0x307ff59000-0x307ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: NODE_DATA(3) allocated [mem 0x407ff58000-0x407ff7efff] Jun 18 12:09:07 fir-md1-s1 kernel: Reserving 176MB of memory at 720MB for crashkernel (System RAM: 261692MB) Jun 18 12:09:07 fir-md1-s1 kernel: Zone ranges: Jun 18 12:09:07 fir-md1-s1 kernel: DMA [mem 0x00001000-0x00ffffff] Jun 18 12:09:07 fir-md1-s1 kernel: DMA32 [mem 0x01000000-0xffffffff] Jun 18 12:09:07 fir-md1-s1 kernel: Normal [mem 0x100000000-0x407ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: Movable zone start for each node Jun 18 12:09:07 fir-md1-s1 kernel: Early memory node ranges Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x00001000-0x0008efff] Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x00090000-0x0009ffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x00100000-0x5c3dffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x643e8000-0x6cacefff] Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x6ffff000-0x6fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 0: [mem 0x100000000-0x107f37ffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 1: [mem 0x1080000000-0x207ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 2: [mem 0x2080000000-0x307ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: node 3: [mem 0x3080000000-0x407ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: Initmem setup node 0 [mem 0x00001000-0x107f37ffff] Jun 18 12:09:07 fir-md1-s1 kernel: On node 0 totalpages: 16661990 Jun 18 12:09:07 fir-md1-s1 kernel: DMA zone: 64 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: DMA zone: 1126 pages reserved Jun 18 12:09:07 fir-md1-s1 kernel: DMA zone: 3998 pages, LIFO batch:0 Jun 18 12:09:07 fir-md1-s1 kernel: DMA32 zone: 6380 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: DMA32 zone: 408264 pages, LIFO batch:31 Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 253902 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 16249728 pages, LIFO batch:31 Jun 18 12:09:07 fir-md1-s1 kernel: Initmem setup node 1 [mem 0x1080000000-0x207ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: On node 1 totalpages: 16777088 Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 262142 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 16777088 pages, LIFO batch:31 Jun 18 12:09:07 fir-md1-s1 kernel: Initmem setup node 2 [mem 0x2080000000-0x307ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: On node 2 totalpages: 16777088 Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 262142 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 16777088 pages, LIFO batch:31 Jun 18 12:09:07 fir-md1-s1 kernel: Initmem setup node 3 [mem 0x3080000000-0x407ff7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: On node 3 totalpages: 16777088 Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 262142 pages used for memmap Jun 18 12:09:07 fir-md1-s1 kernel: Normal zone: 16777088 pages, LIFO batch:31 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PM-Timer IO Port: 0x408 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Local APIC address 0xfee00000 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x20] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x30] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x18] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x28] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x38] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x08] lapic_id[0x02] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x09] lapic_id[0x12] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x22] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x32] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0a] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x1a] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x2a] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x3a] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x10] lapic_id[0x04] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x11] lapic_id[0x14] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x12] lapic_id[0x24] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x13] lapic_id[0x34] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x14] lapic_id[0x0c] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x15] lapic_id[0x1c] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x16] lapic_id[0x2c] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x17] lapic_id[0x3c] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x18] lapic_id[0x01] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x19] lapic_id[0x11] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x21] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x31] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x09] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x19] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x29] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x39] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x20] lapic_id[0x03] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x21] lapic_id[0x13] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x22] lapic_id[0x23] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x23] lapic_id[0x33] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x24] lapic_id[0x0b] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x25] lapic_id[0x1b] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x26] lapic_id[0x2b] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x27] lapic_id[0x3b] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x28] lapic_id[0x05] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x29] lapic_id[0x15] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2a] lapic_id[0x25] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2b] lapic_id[0x35] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2c] lapic_id[0x0d] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2d] lapic_id[0x1d] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2e] lapic_id[0x2d] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x2f] lapic_id[0x3d] enabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x30] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x31] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x32] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x33] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x34] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x35] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x36] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x37] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x38] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x39] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3a] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3b] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3c] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3d] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3e] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x3f] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x40] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x41] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x42] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x43] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x44] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x45] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x46] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x47] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x48] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x49] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4a] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4b] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4c] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4d] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4e] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x4f] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x50] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x51] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x52] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x53] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x54] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x55] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x56] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x57] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x58] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x59] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5a] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5b] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5c] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5d] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5e] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x5f] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x60] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x61] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x62] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x63] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x64] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x65] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x66] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x67] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x68] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x69] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6a] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6b] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6c] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6d] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6e] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x6f] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x70] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x71] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x72] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x73] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x74] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x75] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x76] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x77] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x78] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x79] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7a] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7b] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7c] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7d] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7e] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC (acpi_id[0x7f] lapic_id[0x00] disabled) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1]) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IOAPIC (id[0x80] address[0xfec00000] gsi_base[0]) Jun 18 12:09:07 fir-md1-s1 kernel: IOAPIC[0]: apic_id 128, version 33, address 0xfec00000, GSI 0-23 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IOAPIC (id[0x81] address[0xfd880000] gsi_base[24]) Jun 18 12:09:07 fir-md1-s1 kernel: IOAPIC[1]: apic_id 129, version 33, address 0xfd880000, GSI 24-55 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IOAPIC (id[0x82] address[0xe0900000] gsi_base[56]) Jun 18 12:09:07 fir-md1-s1 kernel: IOAPIC[2]: apic_id 130, version 33, address 0xe0900000, GSI 56-87 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IOAPIC (id[0x83] address[0xc5900000] gsi_base[88]) Jun 18 12:09:07 fir-md1-s1 kernel: IOAPIC[3]: apic_id 131, version 33, address 0xc5900000, GSI 88-119 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IOAPIC (id[0x84] address[0xaa900000] gsi_base[120]) Jun 18 12:09:07 fir-md1-s1 kernel: IOAPIC[4]: apic_id 132, version 33, address 0xaa900000, GSI 120-151 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IRQ0 used by override. Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: IRQ9 used by override. Jun 18 12:09:07 fir-md1-s1 kernel: Using ACPI (MADT) for SMP configuration information Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: HPET id: 0x10228201 base: 0xfed00000 Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Allowing 128 CPUs, 80 hotplug CPUs Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x0008f000-0x0008ffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x4468f000-0x4468ffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x446a7000-0x446a7fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x446a8000-0x446a8fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x446d9000-0x446d9fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x446da000-0x446dafff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x4470b000-0x4470bfff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x5b485000-0x5b485fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x5b48d000-0x5b48dfff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x5c3e0000-0x643e7fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x6cacf000-0x6efcefff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x6efcf000-0x6fdfefff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x6fdff000-0x6fffefff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x70000000-0x8fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x90000000-0xfec0ffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0xfec10000-0xfec10fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0xfec11000-0xfed7ffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0xfed80000-0xfed80fff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0xfed81000-0xffffffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x107f380000-0x107fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x207ff80000-0x207fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registered nosave memory: [mem 0x307ff80000-0x307fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: [mem 0x90000000-0xfec0ffff] available for PCI devices Jun 18 12:09:07 fir-md1-s1 kernel: Booting paravirtualized kernel on bare hardware Jun 18 12:09:07 fir-md1-s1 kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:4 Jun 18 12:09:07 fir-md1-s1 kernel: PERCPU: Embedded 38 pages/cpu @ffff8f153ee00000 s118784 r8192 d28672 u262144 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: s118784 r8192 d28672 u262144 alloc=1*2097152 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [0] 000 004 008 012 016 020 024 028 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [0] 032 036 040 044 048 052 056 060 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [0] 064 068 072 076 080 084 088 092 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [0] 096 100 104 108 112 116 120 124 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [1] 001 005 009 013 017 021 025 029 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [1] 033 037 041 045 049 053 057 061 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [1] 065 069 073 077 081 085 089 093 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [1] 097 101 105 109 113 117 121 125 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [2] 002 006 010 014 018 022 026 030 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [2] 034 038 042 046 050 054 058 062 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [2] 066 070 074 078 082 086 090 094 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [2] 098 102 106 110 114 118 122 126 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [3] 003 007 011 015 019 023 027 031 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [3] 035 039 043 047 051 055 059 063 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [3] 067 071 075 079 083 087 091 095 Jun 18 12:09:07 fir-md1-s1 kernel: pcpu-alloc: [3] 099 103 107 111 115 119 123 127 Jun 18 12:09:07 fir-md1-s1 kernel: Built 4 zonelists in Zone order, mobility grouping on. Total pages: 65945356 Jun 18 12:09:07 fir-md1-s1 kernel: Policy zone: Normal Jun 18 12:09:07 fir-md1-s1 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.1.3.el7_lustre.x86_64 root=UUID=4adf0488-f60f-46c3-a712-956aaee5c4b2 ro crashkernel=auto nomodeset console=ttyS0,115200 LANG=en_US.UTF-8 Jun 18 12:09:07 fir-md1-s1 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 Jun 18 12:09:07 fir-md1-s1 kernel: xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form Jun 18 12:09:07 fir-md1-s1 kernel: Memory: 9614216k/270532096k available (7664k kernel code, 2559080k absent, 4653740k reserved, 6055k data, 1876k init) Jun 18 12:09:07 fir-md1-s1 kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=128, Nodes=4 Jun 18 12:09:07 fir-md1-s1 kernel: Hierarchical RCU implementation. Jun 18 12:09:07 fir-md1-s1 kernel: RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=128. Jun 18 12:09:07 fir-md1-s1 kernel: NR_IRQS:327936 nr_irqs:3624 0 Jun 18 12:09:07 fir-md1-s1 kernel: Console: colour dummy device 80x25 Jun 18 12:09:07 fir-md1-s1 kernel: console [ttyS0] enabled Jun 18 12:09:07 fir-md1-s1 kernel: allocated 1072693248 bytes of page_cgroup Jun 18 12:09:07 fir-md1-s1 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups Jun 18 12:09:07 fir-md1-s1 kernel: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl Jun 18 12:09:07 fir-md1-s1 kernel: hpet clockevent registered Jun 18 12:09:07 fir-md1-s1 kernel: tsc: Fast TSC calibration using PIT Jun 18 12:09:07 fir-md1-s1 kernel: tsc: Detected 1996.233 MHz processor Jun 18 12:09:07 fir-md1-s1 kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.46 BogoMIPS (lpj=1996233) Jun 18 12:09:07 fir-md1-s1 kernel: pid_max: default: 131072 minimum: 1024 Jun 18 12:09:07 fir-md1-s1 kernel: Security Framework initialized Jun 18 12:09:07 fir-md1-s1 kernel: SELinux: Initializing. Jun 18 12:09:07 fir-md1-s1 kernel: SELinux: Starting in permissive mode Jun 18 12:09:07 fir-md1-s1 kernel: Yama: becoming mindful. Jun 18 12:09:07 fir-md1-s1 kernel: Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: random: fast init done Jun 18 12:09:07 fir-md1-s1 kernel: Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys memory Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys devices Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys freezer Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys net_cls Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys blkio Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys perf_event Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys hugetlb Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys pids Jun 18 12:09:07 fir-md1-s1 kernel: Initializing cgroup subsys net_prio Jun 18 12:09:07 fir-md1-s1 kernel: tseg: 0070000000 Jun 18 12:09:07 fir-md1-s1 kernel: mce: CPU supports 23 MCE banks Jun 18 12:09:07 fir-md1-s1 kernel: LVT offset 2 assigned for vector 0xf4 Jun 18 12:09:07 fir-md1-s1 kernel: Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 512 Jun 18 12:09:07 fir-md1-s1 kernel: Last level dTLB entries: 4KB 1536, 2MB 1536, 4MB 768 Jun 18 12:09:07 fir-md1-s1 kernel: tlb_flushall_shift: 6 Jun 18 12:09:07 fir-md1-s1 kernel: Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp Jun 18 12:09:07 fir-md1-s1 kernel: FEATURE SPEC_CTRL Not Present Jun 18 12:09:07 fir-md1-s1 kernel: FEATURE IBPB_SUPPORT Present Jun 18 12:09:07 fir-md1-s1 kernel: Spectre V2 : Mitigation: Full retpoline Jun 18 12:09:07 fir-md1-s1 kernel: Freeing SMP alternatives: 28k freed Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Core revision 20130517 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: All ACPI Tables successfully acquired Jun 18 12:09:07 fir-md1-s1 kernel: ftrace: allocating 29188 entries in 115 pages Jun 18 12:09:07 fir-md1-s1 kernel: Switched APIC routing to physical flat. Jun 18 12:09:07 fir-md1-s1 kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: CPU0: AMD EPYC 7401P 24-Core Processor (fam: 17, model: 01, stepping: 02) Jun 18 12:09:07 fir-md1-s1 kernel: Performance Events: Fam17h core perfctr, AMD PMU driver. Jun 18 12:09:07 fir-md1-s1 kernel: ... version: 0 Jun 18 12:09:07 fir-md1-s1 kernel: ... bit width: 48 Jun 18 12:09:07 fir-md1-s1 kernel: ... generic registers: 6 Jun 18 12:09:07 fir-md1-s1 kernel: ... value mask: 0000ffffffffffff Jun 18 12:09:07 fir-md1-s1 kernel: ... max period: 00007fffffffffff Jun 18 12:09:07 fir-md1-s1 kernel: ... fixed-purpose events: 0 Jun 18 12:09:07 fir-md1-s1 kernel: ... event mask: 000000000000003f Jun 18 12:09:07 fir-md1-s1 kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #1 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #2 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #3 OK Jun 18 12:09:07 fir-md1-s1 kernel: do_IRQ: 4.55 No irq handler for vector (irq -1) Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #4 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors Jun 18 12:09:07 fir-md1-s1 kernel: #5 Jun 18 12:09:07 fir-md1-s1 kernel: OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors Jun 18 12:09:07 fir-md1-s1 kernel: #6 Jun 18 12:09:07 fir-md1-s1 kernel: OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors Jun 18 12:09:07 fir-md1-s1 kernel: #7 Jun 18 12:09:07 fir-md1-s1 kernel: OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #8 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #9 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #10 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #11 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #12 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #13 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #14 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #15 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #16 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #17 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #18 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #19 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #20 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #21 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #22 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #23 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #24 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #25 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #26 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #27 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #28 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #29 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #30 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #31 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #32 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #33 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #34 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #35 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #36 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #37 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #38 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #39 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #40 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #41 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #42 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #43 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 0, Processors #44 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 1, Processors #45 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 2, Processors #46 OK Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Booting Node 3, Processors #47 Jun 18 12:09:07 fir-md1-s1 kernel: Brought up 48 CPUs Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Max logical packages: 3 Jun 18 12:09:07 fir-md1-s1 kernel: smpboot: Total of 48 processors activated (191638.36 BogoMIPS) Jun 18 12:09:07 fir-md1-s1 kernel: node 0 initialised, 15462980 pages in 284ms Jun 18 12:09:07 fir-md1-s1 kernel: node 2 initialised, 15984665 pages in 289ms Jun 18 12:09:07 fir-md1-s1 kernel: node 3 initialised, 15989251 pages in 289ms Jun 18 12:09:07 fir-md1-s1 kernel: node 1 initialised, 15989367 pages in 289ms Jun 18 12:09:07 fir-md1-s1 kernel: devtmpfs: initialized Jun 18 12:09:07 fir-md1-s1 kernel: EVM: security.selinux Jun 18 12:09:07 fir-md1-s1 kernel: EVM: security.ima Jun 18 12:09:07 fir-md1-s1 kernel: EVM: security.capability Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registering ACPI NVS region [mem 0x0008f000-0x0008ffff] (4096 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: PM: Registering ACPI NVS region [mem 0x6efcf000-0x6fdfefff] (14876672 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: atomic64 test passed for x86-64 platform with CX8 and with SSE Jun 18 12:09:07 fir-md1-s1 kernel: pinctrl core: initialized pinctrl subsystem Jun 18 12:09:07 fir-md1-s1 kernel: RTC time: 19:09:02, date: 06/18/19 Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 16 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: bus type PCI registered Jun 18 12:09:07 fir-md1-s1 kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 Jun 18 12:09:07 fir-md1-s1 kernel: PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) Jun 18 12:09:07 fir-md1-s1 kernel: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820 Jun 18 12:09:07 fir-md1-s1 kernel: PCI: Using configuration type 1 for base access Jun 18 12:09:07 fir-md1-s1 kernel: PCI: Dell System detected, enabling pci=bfsort. Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Added _OSI(Module Device) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Added _OSI(Processor Device) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Added _OSI(3.0 _SCP Extensions) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Added _OSI(Processor Aggregator Device) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Added _OSI(Linux-Dell-Video) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: EC: Look up EC in DSDT Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Executed 2 blocks of module-level executable AML code Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Interpreter enabled Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: (supports S0 S5) Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Using IOAPIC for interrupt routing Jun 18 12:09:07 fir-md1-s1 kernel: HEST: Table parsing has been initialized. Jun 18 12:09:07 fir-md1-s1 kernel: PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Enabled 1 GPEs in block 00 to 1F Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) *0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-3f]) Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:00: PCIe AER handled by firmware Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration Jun 18 12:09:07 fir-md1-s1 kernel: PCI host bridge to bus 0000:00 Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [io 0x0000-0x03af window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000c3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000c4000-0x000c7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000c8000-0x000cbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000cc000-0x000cffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000e8000-0x000ebfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000ec000-0x000effff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x000f0000-0x000fffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [io 0x0d00-0x3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0xe1000000-0xfebfffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [mem 0x10000000000-0x2bf3fffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: root bus resource [bus 00-3f] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:00.0: [1022:1450] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:00.2: [1022:1451] type 00 class 0x080600 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:01.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:02.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: [1022:1453] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:04.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:14.0: [1022:790b] type 00 class 0x0c0500 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:14.3: [1022:790e] type 00 class 0x060100 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.0: [1022:1460] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.1: [1022:1461] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.2: [1022:1462] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.3: [1022:1463] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.4: [1022:1464] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.5: [1022:1465] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.6: [1022:1466] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:18.7: [1022:1467] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.0: [1022:1460] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.1: [1022:1461] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.2: [1022:1462] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.3: [1022:1463] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.4: [1022:1464] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.5: [1022:1465] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.6: [1022:1466] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:19.7: [1022:1467] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.0: [1022:1460] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.1: [1022:1461] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.2: [1022:1462] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.3: [1022:1463] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.4: [1022:1464] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.5: [1022:1465] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.6: [1022:1466] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1a.7: [1022:1467] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.0: [1022:1460] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.1: [1022:1461] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.2: [1022:1462] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.3: [1022:1463] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.4: [1022:1464] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.5: [1022:1465] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.6: [1022:1466] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:1b.7: [1022:1467] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: [1000:00d1] type 00 class 0x010700 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: reg 0x10: [mem 0xe1000000-0xe10fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: reg 0x18: [mem 0xe1100000-0xe11fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: reg 0x20: [mem 0xf7500000-0xf75fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: reg 0x24: [io 0x1000-0x10ff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: supports D1 D2 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: PCI bridge to [bus 01] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [io 0x1000-0x1fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [mem 0xf7500000-0xf75fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [mem 0xe1000000-0xe11fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.0: [1022:145a] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.2: [1022:1456] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.2: reg 0x18: [mem 0xf7300000-0xf73fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.2: reg 0x24: [mem 0xf7400000-0xf7401fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.3: [1022:145f] type 00 class 0x0c0330 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.3: reg 0x10: [mem 0xf7200000-0xf72fffff 64bit] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.3: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: PCI bridge to [bus 02] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: bridge window [mem 0xf7200000-0xf74fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.0: [1022:1455] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.1: [1022:1468] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.1: reg 0x18: [mem 0xf7000000-0xf70fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.1: reg 0x24: [mem 0xf7100000-0xf7101fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: PCI bridge to [bus 03] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: bridge window [mem 0xf7000000-0xf71fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: on NUMA node 0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Root Bridge [PC01] (domain 0000 [bus 40-7f]) Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:01: PCIe AER handled by firmware Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:01: _OSC: platform does not support [SHPCHotplug] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:01: FADT indicates ASPM is unsupported, using BIOS configuration Jun 18 12:09:07 fir-md1-s1 kernel: PCI host bridge to bus 0000:40 Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: root bus resource [io 0x4000-0x7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: root bus resource [mem 0xc6000000-0xe0ffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: root bus resource [mem 0x2bf40000000-0x47e7fffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: root bus resource [bus 40-7f] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:00.0: [1022:1450] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:00.2: [1022:1451] type 00 class 0x080600 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:01.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:02.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:03.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:04.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.0: [1022:145a] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.2: [1022:1456] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.2: reg 0x18: [mem 0xdb300000-0xdb3fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.2: reg 0x24: [mem 0xdb400000-0xdb401fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.3: [1022:145f] type 00 class 0x0c0330 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.3: reg 0x10: [mem 0xdb200000-0xdb2fffff 64bit] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.3: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: PCI bridge to [bus 41] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: bridge window [mem 0xdb200000-0xdb4fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.0: [1022:1455] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.1: [1022:1468] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.1: reg 0x18: [mem 0xdb000000-0xdb0fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.1: reg 0x24: [mem 0xdb100000-0xdb101fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: PCI bridge to [bus 42] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: bridge window [mem 0xdb000000-0xdb1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: on NUMA node 1 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Root Bridge [PC02] (domain 0000 [bus 80-bf]) Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:02: PCIe AER handled by firmware Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:02: _OSC: platform does not support [SHPCHotplug] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:02: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:02: FADT indicates ASPM is unsupported, using BIOS configuration Jun 18 12:09:07 fir-md1-s1 kernel: PCI host bridge to bus 0000:80 Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [io 0x03b0-0x03df window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [mem 0x000a0000-0x000bffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [io 0x8000-0xbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [mem 0xab000000-0xc5ffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [mem 0x47e80000000-0x63dbfffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: root bus resource [bus 80-bf] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:00.0: [1022:1450] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:00.2: [1022:1451] type 00 class 0x080600 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: [1022:1453] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: [1022:1453] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:02.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: [1022:1453] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:04.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: [14e4:165f] type 00 class 0x020000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: reg 0x10: [mem 0xaf030000-0xaf03ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: reg 0x18: [mem 0xaf040000-0xaf04ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: reg 0x20: [mem 0xaf050000-0xaf05ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: [14e4:165f] type 00 class 0x020000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: reg 0x10: [mem 0xaf000000-0xaf00ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: reg 0x18: [mem 0xaf010000-0xaf01ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: reg 0x20: [mem 0xaf020000-0xaf02ffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: reg 0x30: [mem 0xfffc0000-0xffffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: PCI bridge to [bus 81] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: bridge window [mem 0xaf000000-0xaf0fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: [1556:be00] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: PCI bridge to [bus 82-83] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: bridge window [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: bridge window [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: [102b:0536] type 00 class 0x030000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: reg 0x10: [mem 0xae000000-0xaeffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: reg 0x14: [mem 0xc0808000-0xc080bfff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: reg 0x18: [mem 0xc0000000-0xc07fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: PCI bridge to [bus 83] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: bridge window [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: bridge window [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: [15b3:1013] type 00 class 0x020700 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: reg 0x10: [mem 0xac000000-0xadffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: PME# supported from D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: PCI bridge to [bus 84] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: bridge window [mem 0xac000000-0xadffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.0: [1022:145a] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.2: [1022:1456] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.2: reg 0x18: [mem 0xc0b00000-0xc0bfffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.2: reg 0x24: [mem 0xc0c00000-0xc0c01fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: PCI bridge to [bus 85] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: bridge window [mem 0xc0b00000-0xc0cfffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.0: [1022:1455] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.1: [1022:1468] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.1: reg 0x18: [mem 0xc0900000-0xc09fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.1: reg 0x24: [mem 0xc0a00000-0xc0a01fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.2: [1022:7901] type 00 class 0x010601 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.2: reg 0x24: [mem 0xc0a02000-0xc0a02fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.2: PME# supported from D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: PCI bridge to [bus 86] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: bridge window [mem 0xc0900000-0xc0afffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: on NUMA node 2 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: PCI Root Bridge [PC03] (domain 0000 [bus c0-ff]) Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: PCIe AER handled by firmware Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: _OSC: platform does not support [SHPCHotplug] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: FADT indicates ASPM is unsupported, using BIOS configuration Jun 18 12:09:07 fir-md1-s1 kernel: acpi PNP0A08:03: host bridge window [mem 0x63dc0000000-0xffffffffffff window] ([0x80000000000-0xffffffffffff] ignored, not CPU addressable) Jun 18 12:09:07 fir-md1-s1 kernel: PCI host bridge to bus 0000:c0 Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: root bus resource [io 0xc000-0xffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: root bus resource [mem 0x90000000-0xaaffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: root bus resource [mem 0x63dc0000000-0x7ffffffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: root bus resource [bus c0-ff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:00.0: [1022:1450] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:00.2: [1022:1451] type 00 class 0x080600 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: [1022:1453] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:02.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:03.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:04.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.0: [1022:1452] type 00 class 0x060000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: [1022:1454] type 01 class 0x060400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: PME# supported from D0 D3hot D3cold Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: [1000:005f] type 00 class 0x010400 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: reg 0x10: [io 0xc000-0xc0ff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: reg 0x14: [mem 0xa5500000-0xa550ffff 64bit] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: reg 0x1c: [mem 0xa5400000-0xa54fffff 64bit] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: supports D1 D2 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: PCI bridge to [bus c1] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: bridge window [io 0xc000-0xcfff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: bridge window [mem 0xa5400000-0xa55fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.0: [1022:145a] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.2: [1022:1456] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.2: reg 0x18: [mem 0xa5200000-0xa52fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.2: reg 0x24: [mem 0xa5300000-0xa5301fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: PCI bridge to [bus c2] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: bridge window [mem 0xa5200000-0xa53fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.0: [1022:1455] type 00 class 0x130000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.1: [1022:1468] type 00 class 0x108000 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.1: reg 0x18: [mem 0xa5000000-0xa50fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.1: reg 0x24: [mem 0xa5100000-0xa5101fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: PCI bridge to [bus c3] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: bridge window [mem 0xa5000000-0xa51fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: on NUMA node 3 Jun 18 12:09:07 fir-md1-s1 kernel: vgaarb: device added: PCI:0000:83:00.0,decodes=io+mem,owns=io+mem,locks=none Jun 18 12:09:07 fir-md1-s1 kernel: vgaarb: loaded Jun 18 12:09:07 fir-md1-s1 kernel: vgaarb: bridge control possible 0000:83:00.0 Jun 18 12:09:07 fir-md1-s1 kernel: SCSI subsystem initialized Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: bus type USB registered Jun 18 12:09:07 fir-md1-s1 kernel: usbcore: registered new interface driver usbfs Jun 18 12:09:07 fir-md1-s1 kernel: usbcore: registered new interface driver hub Jun 18 12:09:07 fir-md1-s1 kernel: usbcore: registered new device driver usb Jun 18 12:09:07 fir-md1-s1 kernel: EDAC MC: Ver: 3.0.0 Jun 18 12:09:07 fir-md1-s1 kernel: PCI: Using ACPI for IRQ routing Jun 18 12:09:07 fir-md1-s1 kernel: PCI: pci_cache_line_size set to 64 bytes Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x0008f000-0x0008ffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x4468f020-0x47ffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x446a8020-0x47ffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x446da020-0x47ffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x5b485020-0x5bffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x5c3e0000-0x5fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x6cacf000-0x6fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x107f380000-0x107fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x207ff80000-0x207fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x307ff80000-0x307fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: e820: reserve RAM buffer [mem 0x407ff80000-0x407fffffff] Jun 18 12:09:07 fir-md1-s1 kernel: NetLabel: Initializing Jun 18 12:09:07 fir-md1-s1 kernel: NetLabel: domain hash size = 128 Jun 18 12:09:07 fir-md1-s1 kernel: NetLabel: protocols = UNLABELED CIPSOv4 Jun 18 12:09:07 fir-md1-s1 kernel: NetLabel: unlabeled traffic allowed by default Jun 18 12:09:07 fir-md1-s1 kernel: hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 Jun 18 12:09:07 fir-md1-s1 kernel: hpet0: 3 comparators, 32-bit 14.318180 MHz counter Jun 18 12:09:07 fir-md1-s1 kernel: Switched to clocksource hpet Jun 18 12:09:07 fir-md1-s1 kernel: pnp: PnP ACPI init Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: bus type PNP registered Jun 18 12:09:07 fir-md1-s1 kernel: system 00:00: [mem 0x80000000-0x8fffffff] has been reserved Jun 18 12:09:07 fir-md1-s1 kernel: system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active) Jun 18 12:09:07 fir-md1-s1 kernel: pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active) Jun 18 12:09:07 fir-md1-s1 kernel: pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active) Jun 18 12:09:07 fir-md1-s1 kernel: pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active) Jun 18 12:09:07 fir-md1-s1 kernel: pnp: PnP ACPI: found 4 devices Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: bus type PNP unregistered Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: BAR 6: no space for [mem size 0x00040000 pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00040000 pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: PCI bridge to [bus 01] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [io 0x1000-0x1fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [mem 0xf7500000-0xf75fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:03.1: bridge window [mem 0xe1000000-0xe11fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: PCI bridge to [bus 02] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:07.1: bridge window [mem 0xf7200000-0xf74fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: PCI bridge to [bus 03] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:08.1: bridge window [mem 0xf7000000-0xf71fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 4 [io 0x0000-0x03af window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 5 [io 0x03e0-0x0cf7 window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 6 [mem 0x000c0000-0x000c3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 7 [mem 0x000c4000-0x000c7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 8 [mem 0x000c8000-0x000cbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 9 [mem 0x000cc000-0x000cffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 10 [mem 0x000d0000-0x000d3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 11 [mem 0x000d4000-0x000d7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 12 [mem 0x000d8000-0x000dbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 13 [mem 0x000dc000-0x000dffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 14 [mem 0x000e0000-0x000e3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 15 [mem 0x000e4000-0x000e7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 16 [mem 0x000e8000-0x000ebfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 17 [mem 0x000ec000-0x000effff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 18 [mem 0x000f0000-0x000fffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 19 [io 0x0d00-0x3fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 20 [mem 0xe1000000-0xfebfffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:00: resource 21 [mem 0x10000000000-0x2bf3fffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:01: resource 0 [io 0x1000-0x1fff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:01: resource 1 [mem 0xf7500000-0xf75fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:01: resource 2 [mem 0xe1000000-0xe11fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:02: resource 1 [mem 0xf7200000-0xf74fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:03: resource 1 [mem 0xf7000000-0xf71fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: PCI bridge to [bus 41] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:07.1: bridge window [mem 0xdb200000-0xdb4fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: PCI bridge to [bus 42] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:08.1: bridge window [mem 0xdb000000-0xdb1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: resource 4 [io 0x4000-0x7fff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: resource 5 [mem 0xc6000000-0xe0ffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:40: resource 6 [mem 0x2bf40000000-0x47e7fffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:41: resource 1 [mem 0xdb200000-0xdb4fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:42: resource 1 [mem 0xdb000000-0xdb1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: BAR 14: assigned [mem 0xab000000-0xab0fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: BAR 14: assigned [mem 0xab100000-0xab1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: BAR 6: assigned [mem 0xab000000-0xab03ffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: BAR 6: assigned [mem 0xab040000-0xab07ffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: PCI bridge to [bus 81] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: bridge window [mem 0xab000000-0xab0fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.1: bridge window [mem 0xaf000000-0xaf0fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: PCI bridge to [bus 83] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: bridge window [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: bridge window [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: PCI bridge to [bus 82-83] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: bridge window [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:01.2: bridge window [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: BAR 6: assigned [mem 0xab100000-0xab1fffff pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: PCI bridge to [bus 84] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: bridge window [mem 0xab100000-0xab1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:03.1: bridge window [mem 0xac000000-0xadffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: PCI bridge to [bus 85] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:07.1: bridge window [mem 0xc0b00000-0xc0cfffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: PCI bridge to [bus 86] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:08.1: bridge window [mem 0xc0900000-0xc0afffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: resource 4 [io 0x03b0-0x03df window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: resource 5 [mem 0x000a0000-0x000bffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: resource 6 [io 0x8000-0xbfff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: resource 7 [mem 0xab000000-0xc5ffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:80: resource 8 [mem 0x47e80000000-0x63dbfffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:81: resource 1 [mem 0xab000000-0xab0fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:81: resource 2 [mem 0xaf000000-0xaf0fffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:82: resource 1 [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:82: resource 2 [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:83: resource 1 [mem 0xc0000000-0xc08fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:83: resource 2 [mem 0xae000000-0xaeffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:84: resource 1 [mem 0xab100000-0xab1fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:84: resource 2 [mem 0xac000000-0xadffffff 64bit pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:85: resource 1 [mem 0xc0b00000-0xc0cfffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:86: resource 1 [mem 0xc0900000-0xc0afffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: BAR 6: no space for [mem size 0x00100000 pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: BAR 6: failed to assign [mem size 0x00100000 pref] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: PCI bridge to [bus c1] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: bridge window [io 0xc000-0xcfff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:01.1: bridge window [mem 0xa5400000-0xa55fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: PCI bridge to [bus c2] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:07.1: bridge window [mem 0xa5200000-0xa53fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: PCI bridge to [bus c3] Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:08.1: bridge window [mem 0xa5000000-0xa51fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: resource 4 [io 0xc000-0xffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: resource 5 [mem 0x90000000-0xaaffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c0: resource 6 [mem 0x63dc0000000-0x7ffffffffff window] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c1: resource 0 [io 0xc000-0xcfff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c1: resource 1 [mem 0xa5400000-0xa55fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c2: resource 1 [mem 0xa5200000-0xa53fffff] Jun 18 12:09:07 fir-md1-s1 kernel: pci_bus 0000:c3: resource 1 [mem 0xa5000000-0xa51fffff] Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 2 Jun 18 12:09:07 fir-md1-s1 kernel: TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: TCP: Hash tables configured (established 524288 bind 65536) Jun 18 12:09:07 fir-md1-s1 kernel: TCP: reno registered Jun 18 12:09:07 fir-md1-s1 kernel: UDP hash table entries: 65536 (order: 9, 2097152 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 1 Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: Boot video device Jun 18 12:09:07 fir-md1-s1 kernel: PCI: CLS 32 bytes, default 64 Jun 18 12:09:07 fir-md1-s1 kernel: Unpacking initramfs... Jun 18 12:09:07 fir-md1-s1 kernel: Freeing initrd memory: 19788k freed Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: IOMMU performance counters supported Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: IOMMU performance counters supported Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: IOMMU performance counters supported Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: IOMMU performance counters supported Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:01.0 to group 0 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:02.0 to group 1 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:03.0 to group 2 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:03.1 to group 2 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:04.0 to group 3 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:07.0 to group 4 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:07.1 to group 4 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:08.0 to group 5 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:08.1 to group 5 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:14.0 to group 6 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:14.3 to group 6 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.0 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.1 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.2 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.3 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.4 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.5 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.6 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:18.7 to group 7 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.0 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.1 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.2 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.3 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.4 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.5 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.6 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:19.7 to group 8 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.0 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.1 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.2 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.3 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.4 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.5 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.6 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1a.7 to group 9 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.0 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.1 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.2 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.3 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.4 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.5 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.6 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:00:1b.7 to group 10 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:01:00.0 to group 2 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:02:00.0 to group 4 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:02:00.2 to group 4 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:02:00.3 to group 4 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:03:00.0 to group 5 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:03:00.1 to group 5 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:01.0 to group 11 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:02.0 to group 12 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:03.0 to group 13 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:04.0 to group 14 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:07.0 to group 15 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:07.1 to group 15 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:08.0 to group 16 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:40:08.1 to group 16 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:41:00.0 to group 15 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:41:00.2 to group 15 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:41:00.3 to group 15 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:42:00.0 to group 16 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:42:00.1 to group 16 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:01.0 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:01.1 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:01.2 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:02.0 to group 18 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:03.0 to group 19 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:03.1 to group 19 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:04.0 to group 20 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:07.0 to group 21 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:07.1 to group 21 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:08.0 to group 22 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:80:08.1 to group 22 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:81:00.0 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:81:00.1 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:82:00.0 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:83:00.0 to group 17 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:84:00.0 to group 19 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:85:00.0 to group 21 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:85:00.2 to group 21 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:86:00.0 to group 22 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:86:00.1 to group 22 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:86:00.2 to group 22 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:01.0 to group 23 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:01.1 to group 23 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:02.0 to group 24 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:03.0 to group 25 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:04.0 to group 26 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:07.0 to group 27 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:07.1 to group 27 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:08.0 to group 28 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c0:08.1 to group 28 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c1:00.0 to group 23 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c2:00.0 to group 27 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c2:00.2 to group 27 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c3:00.0 to group 28 Jun 18 12:09:07 fir-md1-s1 kernel: iommu: Adding device 0000:c3:00.1 to group 28 Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Extended features (0xf77ef22294ada): Jun 18 12:09:07 fir-md1-s1 kernel: PPR NX GT IA GA PC GA_vAPIC Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40 Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Extended features (0xf77ef22294ada): Jun 18 12:09:07 fir-md1-s1 kernel: PPR NX GT IA GA PC GA_vAPIC Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Found IOMMU at 0000:80:00.2 cap 0x40 Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Extended features (0xf77ef22294ada): Jun 18 12:09:07 fir-md1-s1 kernel: PPR NX GT IA GA PC GA_vAPIC Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Found IOMMU at 0000:c0:00.2 cap 0x40 Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Extended features (0xf77ef22294ada): Jun 18 12:09:07 fir-md1-s1 kernel: PPR NX GT IA GA PC GA_vAPIC Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Interrupt remapping enabled Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: virtual APIC enabled Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:00:00.2: irq 26 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:40:00.2: irq 27 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:80:00.2: irq 28 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c0:00.2: irq 29 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: AMD-Vi: Lazy IO/TLB flushing enabled Jun 18 12:09:07 fir-md1-s1 kernel: perf: AMD NB counters detected Jun 18 12:09:07 fir-md1-s1 kernel: perf: AMD LLC counters detected Jun 18 12:09:07 fir-md1-s1 kernel: sha1_ssse3: Using SHA-NI optimized SHA-1 implementation Jun 18 12:09:07 fir-md1-s1 kernel: sha256_ssse3: Using SHA-256-NI optimized SHA-256 implementation Jun 18 12:09:07 fir-md1-s1 kernel: futex hash table entries: 32768 (order: 9, 2097152 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: Initialise system trusted keyring Jun 18 12:09:07 fir-md1-s1 kernel: audit: initializing netlink socket (disabled) Jun 18 12:09:07 fir-md1-s1 kernel: type=2000 audit(1560884939.538:1): initialized Jun 18 12:09:07 fir-md1-s1 kernel: HugeTLB registered 1 GB page size, pre-allocated 0 pages Jun 18 12:09:07 fir-md1-s1 kernel: HugeTLB registered 2 MB page size, pre-allocated 0 pages Jun 18 12:09:07 fir-md1-s1 kernel: zpool: loaded Jun 18 12:09:07 fir-md1-s1 kernel: zbud: loaded Jun 18 12:09:07 fir-md1-s1 kernel: VFS: Disk quotas dquot_6.5.2 Jun 18 12:09:07 fir-md1-s1 kernel: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Jun 18 12:09:07 fir-md1-s1 kernel: msgmni has been set to 32768 Jun 18 12:09:07 fir-md1-s1 kernel: Key type big_key registered Jun 18 12:09:07 fir-md1-s1 kernel: SELinux: Registering netfilter hooks Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 38 Jun 18 12:09:07 fir-md1-s1 kernel: Key type asymmetric registered Jun 18 12:09:07 fir-md1-s1 kernel: Asymmetric key parser 'x509' registered Jun 18 12:09:07 fir-md1-s1 kernel: Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) Jun 18 12:09:07 fir-md1-s1 kernel: io scheduler noop registered Jun 18 12:09:07 fir-md1-s1 kernel: io scheduler deadline registered (default) Jun 18 12:09:07 fir-md1-s1 kernel: io scheduler cfq registered Jun 18 12:09:07 fir-md1-s1 kernel: io scheduler mq-deadline registered Jun 18 12:09:07 fir-md1-s1 kernel: io scheduler kyber registered Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:03.1: irq 30 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:07.1: irq 31 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:08.1: irq 33 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:40:07.1: irq 34 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:40:08.1: irq 36 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:01.1: irq 37 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:01.2: irq 38 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:03.1: irq 39 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:07.1: irq 41 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:08.1: irq 43 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:01.1: irq 44 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:07.1: irq 46 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:08.1: irq 48 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:03.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:01:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:00:03.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:07.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:02:00.3: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:00:07.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:00:08.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:03:00.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:00:08.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:40:07.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:41:00.3: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:40:07.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:40:08.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:42:00.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:40:08.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:01.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:81:00.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:80:01.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:01.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:82:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:83:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:80:01.2:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:03.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:84:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:80:03.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:07.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:85:00.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:80:07.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:80:08.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:86:00.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:80:08.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:01.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c1:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:c0:01.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:07.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c2:00.2: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:c0:07.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pcieport 0000:c0:08.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.0: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pci 0000:c3:00.1: Signaling PME through PCIe PME interrupt Jun 18 12:09:07 fir-md1-s1 kernel: pcie_pme 0000:c0:08.1:pcie001: service driver pcie_pme loaded Jun 18 12:09:07 fir-md1-s1 kernel: pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Jun 18 12:09:07 fir-md1-s1 kernel: pciehp: PCI Express Hot Plug Controller Driver version: 0.4 Jun 18 12:09:07 fir-md1-s1 kernel: shpchp 0000:82:00.0: Cannot get control of SHPC hotplug Jun 18 12:09:07 fir-md1-s1 kernel: shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 Jun 18 12:09:07 fir-md1-s1 kernel: efifb: probing for efifb Jun 18 12:09:07 fir-md1-s1 kernel: efifb: framebuffer at 0xae000000, mapped to 0xffffb01219800000, using 3072k, total 3072k Jun 18 12:09:07 fir-md1-s1 kernel: efifb: mode is 1024x768x32, linelength=4096, pages=1 Jun 18 12:09:07 fir-md1-s1 kernel: efifb: scrolling: redraw Jun 18 12:09:07 fir-md1-s1 kernel: efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 Jun 18 12:09:07 fir-md1-s1 kernel: Console: switching to colour frame buffer device 128x48 Jun 18 12:09:07 fir-md1-s1 kernel: fb0: EFI VGA frame buffer device Jun 18 12:09:07 fir-md1-s1 kernel: input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input0 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Power Button [PWRB] Jun 18 12:09:07 fir-md1-s1 kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1 Jun 18 12:09:07 fir-md1-s1 kernel: ACPI: Power Button [PWRF] Jun 18 12:09:07 fir-md1-s1 kernel: GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. Jun 18 12:09:07 fir-md1-s1 kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled Jun 18 12:09:07 fir-md1-s1 kernel: 00:02: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Jun 18 12:09:07 fir-md1-s1 kernel: 00:03: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Jun 18 12:09:07 fir-md1-s1 kernel: Non-volatile memory driver v1.3 Jun 18 12:09:07 fir-md1-s1 kernel: Linux agpgart interface v0.103 Jun 18 12:09:07 fir-md1-s1 kernel: crash memory driver: version 1.1 Jun 18 12:09:07 fir-md1-s1 kernel: rdac: device handler registered Jun 18 12:09:07 fir-md1-s1 kernel: hp_sw: device handler registered Jun 18 12:09:07 fir-md1-s1 kernel: emc: device handler registered Jun 18 12:09:07 fir-md1-s1 kernel: alua: device handler registered Jun 18 12:09:07 fir-md1-s1 kernel: libphy: Fixed MDIO Bus: probed Jun 18 12:09:07 fir-md1-s1 kernel: ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver Jun 18 12:09:07 fir-md1-s1 kernel: ehci-pci: EHCI PCI platform driver Jun 18 12:09:07 fir-md1-s1 kernel: ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver Jun 18 12:09:07 fir-md1-s1 kernel: ohci-pci: OHCI PCI platform driver Jun 18 12:09:07 fir-md1-s1 kernel: uhci_hcd: USB Universal Host Controller Interface driver Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: new USB bus registered, assigned bus number 1 Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x00000410 Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 50 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 51 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 52 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 53 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 54 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 55 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 56 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: irq 57 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb1: Product: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: usb usb1: Manufacturer: Linux 3.10.0-957.1.3.el7_lustre.x86_64 xhci-hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb usb1: SerialNumber: 0000:02:00.3 Jun 18 12:09:07 fir-md1-s1 kernel: hub 1-0:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 1-0:1.0: 2 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:02:00.3: new USB bus registered, assigned bus number 2 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: Product: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: Manufacturer: Linux 3.10.0-957.1.3.el7_lustre.x86_64 xhci-hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb usb2: SerialNumber: 0000:02:00.3 Jun 18 12:09:07 fir-md1-s1 kernel: hub 2-0:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 2-0:1.0: 2 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: new USB bus registered, assigned bus number 3 Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x00000410 Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 59 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 60 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 61 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 62 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 63 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 64 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 65 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: irq 66 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: usb usb3: New USB device found, idVendor=1d6b, idProduct=0002 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb3: Product: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: usb usb3: Manufacturer: Linux 3.10.0-957.1.3.el7_lustre.x86_64 xhci-hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb usb3: SerialNumber: 0000:41:00.3 Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-0:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-0:1.0: 2 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: xhci_hcd 0000:41:00.3: new USB bus registered, assigned bus number 4 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: Product: xHCI Host Controller Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: Manufacturer: Linux 3.10.0-957.1.3.el7_lustre.x86_64 xhci-hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb usb4: SerialNumber: 0000:41:00.3 Jun 18 12:09:07 fir-md1-s1 kernel: hub 4-0:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 4-0:1.0: 2 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: usbcore: registered new interface driver usbserial_generic Jun 18 12:09:07 fir-md1-s1 kernel: usbserial: USB Serial support registered for generic Jun 18 12:09:07 fir-md1-s1 kernel: i8042: PNP: No PS/2 controller found. Probing ports directly. Jun 18 12:09:07 fir-md1-s1 kernel: usb 1-1: new high-speed USB device number 2 using xhci_hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1: new high-speed USB device number 2 using xhci_hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb 1-1: New USB device found, idVendor=0424, idProduct=2744 Jun 18 12:09:07 fir-md1-s1 kernel: usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Jun 18 12:09:07 fir-md1-s1 kernel: usb 1-1: Product: USB2734 Jun 18 12:09:07 fir-md1-s1 kernel: usb 1-1: Manufacturer: Microchip Tech Jun 18 12:09:07 fir-md1-s1 kernel: hub 1-1:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 1-1:1.0: 4 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1: New USB device found, idVendor=1604, idProduct=10c0 Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1:1.0: 4 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: usb 2-1: New USB device found, idVendor=0424, idProduct=5744 Jun 18 12:09:07 fir-md1-s1 kernel: usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=0 Jun 18 12:09:07 fir-md1-s1 kernel: usb 2-1: Product: USB5734 Jun 18 12:09:07 fir-md1-s1 kernel: usb 2-1: Manufacturer: Microchip Tech Jun 18 12:09:07 fir-md1-s1 kernel: hub 2-1:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 2-1:1.0: 4 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: usb: port power management may be unreliable Jun 18 12:09:07 fir-md1-s1 kernel: i8042: No controller found Jun 18 12:09:07 fir-md1-s1 kernel: tsc: Refined TSC clocksource calibration: 1996.249 MHz Jun 18 12:09:07 fir-md1-s1 kernel: mousedev: PS/2 mouse device common for all mice Jun 18 12:09:07 fir-md1-s1 kernel: rtc_cmos 00:01: RTC can wake from S4 Jun 18 12:09:07 fir-md1-s1 kernel: rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0 Jun 18 12:09:07 fir-md1-s1 kernel: rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs Jun 18 12:09:07 fir-md1-s1 kernel: cpuidle: using governor menu Jun 18 12:09:07 fir-md1-s1 kernel: EFI Variables Facility v0.08 2004-May-17 Jun 18 12:09:07 fir-md1-s1 kernel: hidraw: raw HID events driver (C) Jiri Kosina Jun 18 12:09:07 fir-md1-s1 kernel: usbcore: registered new interface driver usbhid Jun 18 12:09:07 fir-md1-s1 kernel: usbhid: USB HID core driver Jun 18 12:09:07 fir-md1-s1 kernel: drop_monitor: Initializing network drop monitor service Jun 18 12:09:07 fir-md1-s1 kernel: TCP: cubic registered Jun 18 12:09:07 fir-md1-s1 kernel: Initializing XFRM netlink socket Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 10 Jun 18 12:09:07 fir-md1-s1 kernel: NET: Registered protocol family 17 Jun 18 12:09:07 fir-md1-s1 kernel: mpls_gso: MPLS GSO support Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU0: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU1: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU2: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU3: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU4: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU5: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.1: new high-speed USB device number 3 using xhci_hcd Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU6: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU7: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: Switched to clocksource tsc Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU8: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU9: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU10: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU11: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU12: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU13: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU14: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU15: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU16: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU17: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU18: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU19: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU20: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU21: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU22: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU23: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU24: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU25: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU26: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU27: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU28: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU29: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU30: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU31: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU32: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU33: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU34: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU35: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU36: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU37: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU38: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU39: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU40: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU41: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU42: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU43: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU44: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU45: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU46: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: CPU47: patch_level=0x08001227 Jun 18 12:09:07 fir-md1-s1 kernel: microcode: Microcode Update Driver: v2.01 , Peter Oruba Jun 18 12:09:07 fir-md1-s1 kernel: PM: Hibernation image not present or could not be loaded. Jun 18 12:09:07 fir-md1-s1 kernel: Loading compiled-in X.509 certificates Jun 18 12:09:07 fir-md1-s1 kernel: Loaded X.509 cert 'Red Hat Enterprise Linux Driver Update Program (key 3): bf57f3e87362bc7229d9f465321773dfd1f77a80' Jun 18 12:09:07 fir-md1-s1 kernel: Loaded X.509 cert 'Red Hat Enterprise Linux kpatch signing key: 4d38fd864ebe18c5f0b72e3852e2014c3a676fc8' Jun 18 12:09:07 fir-md1-s1 kernel: Loaded X.509 cert 'Red Hat Enterprise Linux kernel signing key: 26463bf7b35aa6e910b2216d61318fa5ff5b7954' Jun 18 12:09:07 fir-md1-s1 kernel: registered taskstats version 1 Jun 18 12:09:07 fir-md1-s1 kernel: Key type trusted registered Jun 18 12:09:07 fir-md1-s1 kernel: Key type encrypted registered Jun 18 12:09:07 fir-md1-s1 kernel: IMA: No TPM chip found, activating TPM-bypass! (rc=-19) Jun 18 12:09:07 fir-md1-s1 kernel: Magic number: 7:983:192 Jun 18 12:09:07 fir-md1-s1 kernel: acpi device:1e: hash matches Jun 18 12:09:07 fir-md1-s1 kernel: memory memory1550: hash matches Jun 18 12:09:07 fir-md1-s1 kernel: memory memory763: hash matches Jun 18 12:09:07 fir-md1-s1 kernel: rtc_cmos 00:01: setting system clock to 2019-06-18 19:09:06 UTC (1560884946) Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.1: New USB device found, idVendor=1604, idProduct=10c0 Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1.1:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1.1:1.0: 4 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.4: new high-speed USB device number 4 using xhci_hcd Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.4: New USB device found, idVendor=1604, idProduct=10c0 Jun 18 12:09:07 fir-md1-s1 kernel: usb 3-1.4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1.4:1.0: USB hub found Jun 18 12:09:07 fir-md1-s1 kernel: hub 3-1.4:1.0: 4 ports detected Jun 18 12:09:07 fir-md1-s1 kernel: Freeing unused kernel memory: 1876k freed Jun 18 12:09:07 fir-md1-s1 kernel: Write protecting the kernel read-only data: 12288k Jun 18 12:09:07 fir-md1-s1 kernel: Freeing unused kernel memory: 516k freed Jun 18 12:09:07 fir-md1-s1 kernel: Freeing unused kernel memory: 600k freed Jun 18 12:09:07 fir-md1-s1 systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN) Jun 18 12:09:07 fir-md1-s1 systemd[1]: Detected architecture x86-64. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Running in initial RAM disk. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Set hostname to . Jun 18 12:09:07 fir-md1-s1 systemd[1]: Reached target Swap. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Reached target Timers. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Created slice Root Slice. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Listening on udev Control Socket. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Created slice System Slice. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Reached target Slices. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Listening on Journal Socket. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Starting Load Kernel Modules... Jun 18 12:09:07 fir-md1-s1 systemd[1]: Starting Create list of required static device nodes for the current kernel... Jun 18 12:09:07 fir-md1-s1 systemd[1]: Starting Journal Service... Jun 18 12:09:07 fir-md1-s1 systemd[1]: Starting dracut cmdline hook... Jun 18 12:09:07 fir-md1-s1 systemd[1]: Starting Setup Virtual Console... Jun 18 12:09:07 fir-md1-s1 systemd[1]: Listening on udev Kernel Socket. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Reached target Sockets. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Reached target Local File Systems. Jun 18 12:09:07 fir-md1-s1 systemd[1]: Started Journal Service. Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas: loading out-of-tree module taints kernel. Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas: module verification failed: signature and/or required key missing - tainting kernel Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas version 27.00.00.00 loaded Jun 18 12:09:07 fir-md1-s1 kernel: pps_core: LinuxPPS API ver. 1 registered Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (263565264 kB) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: IOC Number : 0 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 68 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 69 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 70 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 71 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 72 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 73 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 74 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 75 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 76 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 77 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 78 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 79 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 80 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 81 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 82 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 83 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 84 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 85 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 86 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 87 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 88 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 89 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 90 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 91 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 92 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 93 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 94 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 95 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 96 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 97 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 98 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 99 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 100 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 101 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 102 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 103 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 104 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 105 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 106 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 107 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 108 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 109 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 110 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 111 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 112 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 113 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 114 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas 0000:01:00.0: irq 115 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 68 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 69 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 70 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 71 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 72 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 73 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 74 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 75 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix8: PCI-MSI-X enabled: IRQ 76 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix9: PCI-MSI-X enabled: IRQ 77 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix10: PCI-MSI-X enabled: IRQ 78 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix11: PCI-MSI-X enabled: IRQ 79 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix12: PCI-MSI-X enabled: IRQ 80 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix13: PCI-MSI-X enabled: IRQ 81 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix14: PCI-MSI-X enabled: IRQ 82 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix15: PCI-MSI-X enabled: IRQ 83 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix16: PCI-MSI-X enabled: IRQ 84 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix17: PCI-MSI-X enabled: IRQ 85 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix18: PCI-MSI-X enabled: IRQ 86 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix19: PCI-MSI-X enabled: IRQ 87 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix20: PCI-MSI-X enabled: IRQ 88 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix21: PCI-MSI-X enabled: IRQ 89 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix22: PCI-MSI-X enabled: IRQ 90 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix23: PCI-MSI-X enabled: IRQ 91 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix24: PCI-MSI-X enabled: IRQ 92 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix25: PCI-MSI-X enabled: IRQ 93 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix26: PCI-MSI-X enabled: IRQ 94 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix27: PCI-MSI-X enabled: IRQ 95 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix28: PCI-MSI-X enabled: IRQ 96 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix29: PCI-MSI-X enabled: IRQ 97 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix30: PCI-MSI-X enabled: IRQ 98 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix31: PCI-MSI-X enabled: IRQ 99 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix32: PCI-MSI-X enabled: IRQ 100 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix33: PCI-MSI-X enabled: IRQ 101 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix34: PCI-MSI-X enabled: IRQ 102 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix35: PCI-MSI-X enabled: IRQ 103 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix36: PCI-MSI-X enabled: IRQ 104 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix37: PCI-MSI-X enabled: IRQ 105 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix38: PCI-MSI-X enabled: IRQ 106 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix39: PCI-MSI-X enabled: IRQ 107 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix40: PCI-MSI-X enabled: IRQ 108 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix41: PCI-MSI-X enabled: IRQ 109 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix42: PCI-MSI-X enabled: IRQ 110 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix43: PCI-MSI-X enabled: IRQ 111 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix44: PCI-MSI-X enabled: IRQ 112 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix45: PCI-MSI-X enabled: IRQ 113 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix46: PCI-MSI-X enabled: IRQ 114 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas0-msix47: PCI-MSI-X enabled: IRQ 115 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: iomem(0x00000000e1000000), mapped(0xffffb0121a000000), size(1048576) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: ioport(0x0000000000001000), size(256) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: IOC Number : 0 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k Jun 18 12:09:07 fir-md1-s1 kernel: pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti Jun 18 12:09:07 fir-md1-s1 kernel: megasas: 07.705.02.00-rh1 Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: FW now in Ready state Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: 64 bit DMA mask and 32 bit consistent mask Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 117 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 118 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 119 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 120 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 121 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 122 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 123 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 124 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 125 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 126 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 127 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 128 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 129 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 130 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 131 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 132 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 133 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 134 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 135 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 136 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 137 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 138 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 139 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 140 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 141 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 142 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 143 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 144 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 145 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 146 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 147 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 148 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 149 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 150 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 151 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 152 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 153 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 154 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 155 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 156 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 157 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 158 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 159 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 160 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 161 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 162 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 163 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: irq 164 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: firmware supports msix : (96) Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: current msix/online cpus : (48/48) Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: RDPQ mode : (disabled) Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Current firmware supports maximum commands: 928 LDIO threshold: 237 Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Configured max firmware commands: 927 Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: FW supports sync cache : No Jun 18 12:09:07 fir-md1-s1 kernel: PTP clock support registered Jun 18 12:09:07 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: Allocated physical memory: size(38831 kB) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: Current Controller Queue Depth(7564), Max Controller Queue Depth(7680) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: Scatter Gather Elements per IO(128) Jun 18 12:09:07 fir-md1-s1 kernel: libata version 3.00 loaded. Jun 18 12:09:07 fir-md1-s1 kernel: Compat-mlnx-ofed backport release: b4fdfac Jun 18 12:09:07 fir-md1-s1 kernel: Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git b4fdfac Jun 18 12:09:07 fir-md1-s1 kernel: compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: FW Package Version(08.00.00.00) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: SAS3616: FWVersion(08.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: Protocol=(Initiator,Target,NVMe), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ) Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: : host protection capabilities enabled DIF1 DIF2 DIF3 Jun 18 12:09:07 fir-md1-s1 kernel: scsi host0: Fusion MPT SAS Host Jun 18 12:09:07 fir-md1-s1 kernel: mpt3sas_cm0: sending port enable !! Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Init cmd return status SUCCESS for SCSI host 1 Jun 18 12:09:07 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: firmware type : Legacy(64 VD) firmware Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: controller type : iMR(0MB) Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Online Controller Reset(OCR) : Enabled Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Secure JBOD support : No Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: NVMe passthru support : No Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: INIT adapter done Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: Jbod map is not supported megasas_setup_jbod_map 5146 Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: pci id : (0x1000)/(0x005f)/(0x1028)/(0x1f4b) Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: unevenspan support : yes Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: firmware crash dump : no Jun 18 12:09:07 fir-md1-s1 kernel: megaraid_sas 0000:c1:00.0: jbod sync map : no Jun 18 12:09:07 fir-md1-s1 kernel: scsi host1: Avago SAS based MegaRAID driver Jun 18 12:09:07 fir-md1-s1 kernel: scsi 1:2:0:0: Direct-Access DELL PERC H330 Mini 4.29 PQ: 0 ANSI: 5 Jun 18 12:09:07 fir-md1-s1 kernel: tg3.c:v3.137 (May 11, 2014) Jun 18 12:09:07 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: version 3.0 Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 167 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 168 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 169 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 170 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 171 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 172 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 173 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 174 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 175 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 176 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 177 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 178 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 179 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 180 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 181 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: irq 182 for MSI/MSI-X Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode Jun 18 12:09:07 fir-md1-s1 kernel: ahci 0000:86:00.2: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum part Jun 18 12:09:07 fir-md1-s1 kernel: scsi host2: ahci Jun 18 12:09:07 fir-md1-s1 kernel: ata1: SATA max UDMA/133 abar m4096@0xc0a02000 port 0xc0a02100 irq 167 Jun 18 12:09:07 fir-md1-s1 kernel: tg3 0000:81:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:34:4a:7d Jun 18 12:09:07 fir-md1-s1 kernel: tg3 0000:81:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) Jun 18 12:09:07 fir-md1-s1 kernel: tg3 0000:81:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] Jun 18 12:09:07 fir-md1-s1 kernel: tg3 0000:81:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit] Jun 18 12:09:08 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: firmware version: 12.24.1000 Jun 18 12:09:08 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) Jun 18 12:09:08 fir-md1-s1 kernel: tg3 0000:81:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:34:4a:7e Jun 18 12:09:08 fir-md1-s1 kernel: tg3 0000:81:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) Jun 18 12:09:08 fir-md1-s1 kernel: tg3 0000:81:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] Jun 18 12:09:08 fir-md1-s1 kernel: tg3 0000:81:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit] Jun 18 12:09:08 fir-md1-s1 kernel: ata1: SATA link down (SStatus 0 SControl 300) Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 185 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 186 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 187 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 188 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 189 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 190 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 191 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 192 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 193 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 194 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 195 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 196 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 197 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 198 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 199 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 200 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 201 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 202 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 203 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 204 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 205 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 206 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 207 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 208 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 209 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 210 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 211 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 212 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 213 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 214 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 215 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 216 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 217 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 218 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 219 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 220 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 221 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 222 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 223 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 224 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 225 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 226 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 227 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 228 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 229 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 230 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 231 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 232 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 233 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 234 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 235 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: irq 236 for MSI/MSI-X Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: Port module event: module 0, Cable plugged Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: hba_port entry: ffff8f3536ddec00, port: 255 is added to hba_port list Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b00db90c00), phys(17) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0011), sas_address(0x510600b00db90c00), phy(16) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0011), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0011), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: Enclosure LSI virtualSES 02 PQ: 0 ANSI: 6 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: set ignore_delay_remove for handle(0x0011) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00db90c00), phy(16), device_name(0x510600b00db90c00) Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: FW Tracer Owner Jun 18 12:09:10 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: enclosure logical id(0x500605b00db90c00), slot(16) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: enclosure level(0x0000), connector name( C3 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: serial_number(500605B00DB90C00) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:0:0: qdepth(1), tagged(0), simple(0), ordered(0), scsi_level(7), cmd_que(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206) Jun 18 12:09:10 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:10 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 Jun 18 12:09:10 fir-md1-s1 kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(1) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: enclosure logical id(0x500605b00db90c00), slot(5) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: enclosure level(0x0000), connector name( C1 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: serial_number(021815000354 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) Jun 18 12:09:10 fir-md1-s1 kernel: random: crng init done Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: enclosure logical id(0x500605b00db90c00), slot(5) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: enclosure level(0x0000), connector name( C1 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: serial_number(021815000354 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:1: Mode parameters changed Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: enclosure logical id(0x500605b00db90c00), slot(5) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: enclosure level(0x0000), connector name( C1 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: serial_number(021815000354 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:2: Mode parameters changed Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: enclosure logical id(0x500605b00db90c00), slot(5) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: enclosure level(0x0000), connector name( C1 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: serial_number(021815000354 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:1:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0013), sas_address(0x500a0984dfa1fa20), phy(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(1) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0013), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0013), sas_address(0x500a0984dfa1fa20), phy(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0013), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0013), sas_address(0x500a0984dfa1fa20), phy(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0013), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: SSP: handle(0x0013), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: enclosure logical id(0x500605b00db90c00), slot(13) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: enclosure level(0x0000), connector name( C3 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: serial_number(021825001369 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: SSP: handle(0x0013), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: enclosure logical id(0x500605b00db90c00), slot(13) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: enclosure level(0x0000), connector name( C3 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: serial_number(021825001369 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:1: Mode parameters changed Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: SSP: handle(0x0013), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: enclosure logical id(0x500605b00db90c00), slot(13) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: enclosure level(0x0000), connector name( C3 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: serial_number(021825001369 ) Jun 18 12:09:10 fir-md1-s1 kernel: scsi 0:0:2:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0014), sas_address(0x500a0984da0f9b14), phy(12) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(1) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0014), lun(0) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0014), sas_address(0x500a0984da0f9b14), phy(12) Jun 18 12:09:10 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(0) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0014), lun(0) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: SSP: handle(0x0014), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: enclosure logical id(0x500605b00db90c00), slot(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: enclosure level(0x0000), connector name( C0 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: serial_number(021812047179 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: SSP: handle(0x0014), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: enclosure logical id(0x500605b00db90c00), slot(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: enclosure level(0x0000), connector name( C0 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: serial_number(021812047179 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: SSP: handle(0x0014), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: enclosure logical id(0x500605b00db90c00), slot(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: enclosure level(0x0000), connector name( C0 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: serial_number(021812047179 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: SSP: handle(0x0014), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: enclosure logical id(0x500605b00db90c00), slot(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: enclosure level(0x0000), connector name( C0 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: serial_number(021812047179 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:3:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0015), sas_address(0x500a0984dfa20c14), phy(4) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(0) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(1) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0015), lun(0) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: detecting: handle(0x0015), sas_address(0x500a0984dfa20c14), phy(4) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(0) Jun 18 12:09:11 fir-md1-s1 kernel: mpt3sas_cm0: TEST_UNIT_READY: handle(0x0015), lun(0) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: enclosure logical id(0x500605b00db90c00), slot(9) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: enclosure level(0x0000), connector name( C2 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: serial_number(021825001558 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: enclosure logical id(0x500605b00db90c00), slot(9) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: enclosure level(0x0000), connector name( C2 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: serial_number(021825001558 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: enclosure logical id(0x500605b00db90c00), slot(9) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: enclosure level(0x0000), connector name( C2 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: serial_number(021825001558 ) Jun 18 12:09:11 fir-md1-s1 kernel: scsi 0:0:4:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Jun 18 12:09:16 fir-md1-s1 kernel: mpt3sas_cm0: port enable: SUCCESS Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:1:0: rdac: LUN 0 (IOSHIP) (owned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] 926167040 512-byte logical blocks: (474 GB/441 GiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] 4096-byte physical blocks Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:1:1: rdac: LUN 1 (IOSHIP) (unowned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:1: [sdb] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:1:2: rdac: LUN 2 (IOSHIP) (owned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:2: [sdc] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:2:0: rdac: LUN 0 (IOSHIP) (owned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:2: [sdc] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:2: [sdc] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:0: [sdd] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:2:1: rdac: LUN 1 (IOSHIP) (unowned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:0: [sdd] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:0: [sdd] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:1: [sde] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:3:0: rdac: LUN 0 (IOSHIP) (unowned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] 926167040 512-byte logical blocks: (474 GB/441 GiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] 4096-byte physical blocks Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:1: [sde] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:1: [sde] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:1: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:3:1: rdac: LUN 1 (IOSHIP) (owned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:1: [sdg] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:0: [sda] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:3:2: rdac: LUN 2 (IOSHIP) (unowned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:1: [sdg] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:1: [sdg] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:2: [sdc] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:2: [sdh] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:1: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:4:0: rdac: LUN 0 (IOSHIP) (unowned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:2: [sdh] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:2: [sdh] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:0: [sdi] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:2: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:0: [sdd] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: scsi 0:0:4:1: rdac: LUN 1 (IOSHIP) (owned) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:0: [sdi] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:0: [sdi] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:1: [sdj] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: sd 1:2:0:0: [sdk] 233308160 512-byte logical blocks: (119 GB/111 GiB) Jun 18 12:09:16 fir-md1-s1 kernel: sd 1:2:0:0: [sdk] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 1:2:0:0: [sdk] Mode Sense: 1f 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 1:2:0:0: [sdk] Write cache: disabled, read cache: disabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:1: [sdj] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:1: [sdj] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:1: [sdg] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:0: [sdf] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sdk: sdk1 sdk2 sdk3 Jun 18 12:09:16 fir-md1-s1 kernel: sd 1:2:0:0: [sdk] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:2:1: [sde] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:3:2: [sdh] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:1: [sdj] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:4:0: [sdi] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:1: [sdb] Write Protect is off Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:1: [sdb] Mode Sense: 83 00 10 08 Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 18 12:09:16 fir-md1-s1 kernel: sd 0:0:1:1: [sdb] Attached SCSI disk Jun 18 12:09:16 fir-md1-s1 kernel: EXT4-fs (sdk2): mounted filesystem with ordered data mode. Opts: (null) Jun 18 12:09:17 fir-md1-s1 systemd-journald[357]: Received SIGTERM from PID 1 (systemd). Jun 18 12:09:17 fir-md1-s1 kernel: SELinux: Disabled at runtime. Jun 18 12:09:17 fir-md1-s1 kernel: SELinux: Unregistering netfilter hooks Jun 18 12:09:17 fir-md1-s1 kernel: type=1404 audit(1560884956.963:2): selinux=0 auid=4294967295 ses=4294967295 Jun 18 12:09:17 fir-md1-s1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Jun 18 12:09:17 fir-md1-s1 systemd[1]: Inserted module 'ip_tables' Jun 18 12:09:17 fir-md1-s1 kernel: EXT4-fs (sdk2): re-mounted. Opts: (null) Jun 18 12:09:17 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:09:17 fir-md1-s1 kernel: knem 1.1.3.90mlnx1: initialized Jun 18 12:09:17 fir-md1-s1 kernel: ACPI Error: No handler for Region [SYSI] (ffff8f35e9e82b40) [IPMI] (20130517/evregion-162) Jun 18 12:09:17 fir-md1-s1 kernel: ACPI Error: Jun 18 12:09:17 fir-md1-s1 kernel: Region IPMI (ID=7) has no handler Jun 18 12:09:17 fir-md1-s1 kernel: (20130517/exfldio-305) Jun 18 12:09:17 fir-md1-s1 kernel: ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff8f15e9e7b5a0), AE_NOT_EXIST (20130517/psparse-536) Jun 18 12:09:17 fir-md1-s1 kernel: ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff8f15e9e7b500), AE_NOT_EXIST (20130517/psparse-536) Jun 18 12:09:17 fir-md1-s1 kernel: ACPI Exception: AE_NOT_EXIST, Evaluating _PMC (20130517/power_meter-753) Jun 18 12:09:17 fir-md1-s1 kernel: ipmi message handler version 39.2 Jun 18 12:09:17 fir-md1-s1 kernel: piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 Jun 18 12:09:17 fir-md1-s1 kernel: piix4_smbus 0000:00:14.0: Using register 0x2e for SMBus port selection Jun 18 12:09:17 fir-md1-s1 kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 13 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:1:0: Attached scsi generic sg1 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: 3 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: irq 238 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: irq 239 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 2 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 3 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 4 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 0 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 1 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: Queue 2 gets LSB 6 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:02:00.2: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: 5 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: irq 241 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 0 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 1 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 2 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 3 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 4 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 0 gets LSB 1 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 1 gets LSB 2 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 2 gets LSB 3 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 3 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: Queue 4 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:03:00.1: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: 3 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: irq 243 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: irq 244 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 2 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 3 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 4 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 0 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 1 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: Queue 2 gets LSB 6 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:41:00.2: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: 5 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: irq 246 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 0 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 1 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 2 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 3 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 4 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 0 gets LSB 1 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 1 gets LSB 2 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 2 gets LSB 3 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 3 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: Queue 4 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:42:00.1: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: 3 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: irq 248 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: irq 249 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 2 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 3 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 4 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 0 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 1 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: Queue 2 gets LSB 6 Jun 18 12:09:17 fir-md1-s1 kernel: ipmi device interface Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:85:00.2: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: 5 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: irq 251 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 0 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 1 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 2 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 3 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 4 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 0 gets LSB 1 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 1 gets LSB 2 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 2 gets LSB 3 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 3 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: Queue 4 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:1:1: Attached scsi generic sg2 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:1:2: Attached scsi generic sg3 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: scsi 0:0:1:31: Attached scsi generic sg4 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:2:0: Attached scsi generic sg5 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:2:1: Attached scsi generic sg6 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: scsi 0:0:2:31: Attached scsi generic sg7 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:3:0: Attached scsi generic sg8 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:3:1: Attached scsi generic sg9 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:3:2: Attached scsi generic sg10 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: scsi 0:0:3:31: Attached scsi generic sg11 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:4:0: Attached scsi generic sg12 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 0:0:4:1: Attached scsi generic sg13 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: scsi 0:0:4:31: Attached scsi generic sg14 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: sd 1:2:0:0: Attached scsi generic sg15 type 0 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:86:00.1: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: 3 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: irq 253 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: irq 254 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 2 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 3 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 4 can access 4 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 0 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 1 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: Queue 2 gets LSB 6 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c2:00.2: enabled Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: 5 command queues available Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: irq 256 for MSI/MSI-X Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 0 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 1 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 2 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 3 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 4 can access 7 LSB regions Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 0 gets LSB 1 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 1 gets LSB 2 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 2 gets LSB 3 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 3 gets LSB 4 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: Queue 4 gets LSB 5 Jun 18 12:09:17 fir-md1-s1 kernel: ccp 0000:c3:00.1: enabled Jun 18 12:09:17 fir-md1-s1 kernel: device-mapper: uevent: version 1.0.3 Jun 18 12:09:17 fir-md1-s1 kernel: IPMI System Interface driver. Jun 18 12:09:17 fir-md1-s1 kernel: input: PC Speaker as /devices/platform/pcspkr/input/input2 Jun 18 12:09:17 fir-md1-s1 kernel: device-mapper: ioctl: 4.37.1-ioctl (2018-04-03) initialised: dm-devel@redhat.com Jun 18 12:09:17 fir-md1-s1 kernel: ipmi_si ipmi_si.0: ipmi_platform: probing via SMBIOS Jun 18 12:09:17 fir-md1-s1 kernel: ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10 Jun 18 12:09:17 fir-md1-s1 kernel: ipmi_si: Adding SMBIOS-specified kcs state machine Jun 18 12:09:17 fir-md1-s1 kernel: ipmi_si IPI0001:00: ipmi_platform: probing via ACPI Jun 18 12:09:17 fir-md1-s1 kernel: ipmi_si IPI0001:00: [io 0x0ca8] regsize 1 spacing 4 irq 10 Jun 18 12:09:18 fir-md1-s1 kernel: mpt3sas_cm0: log_info(0x31200205): originator(PL), code(0x20), sub_code(0x0205) Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si ipmi_si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si: Adding ACPI-specified kcs state machine Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 10 Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:1:0: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: cryptd: max_cpu_qlen set to 1000 Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:1:1: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:1:2: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si IPI0001:00: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed. Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si IPI0001:00: Using irq 10 Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si IPI0001:00: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) Jun 18 12:09:18 fir-md1-s1 kernel: AVX2 version of gcm_enc/dec engaged. Jun 18 12:09:18 fir-md1-s1 kernel: scsi 0:0:1:31: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:2:0: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:2:1: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: AES CTR mode by8 optimization enabled Jun 18 12:09:18 fir-md1-s1 kernel: scsi 0:0:2:31: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: ipmi_si IPI0001:00: IPMI kcs interface initialized Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:3:0: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) Jun 18 12:09:18 fir-md1-s1 kernel: alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:3:1: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:3:2: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: kvm: Nested Paging enabled Jun 18 12:09:18 fir-md1-s1 kernel: scsi 0:0:3:31: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:4:0: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: sd 0:0:4:1: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: scsi 0:0:4:31: Embedded Enclosure Device Jun 18 12:09:18 fir-md1-s1 kernel: ses 0:0:0:0: Attached Enclosure device Jun 18 12:09:18 fir-md1-s1 kernel: MCE: In-kernel MCE decoding enabled. Jun 18 12:09:18 fir-md1-s1 kernel: AMD64 EDAC driver v3.4.0 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: DRAM ECC enabled. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: F17h detected (node 0). Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC0 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC1 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: using x8 syndromes. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MCT channel count: 2 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC0: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:18.3 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: DRAM ECC enabled. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: F17h detected (node 1). Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC0 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC1 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: using x8 syndromes. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MCT channel count: 2 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC1: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:19.3 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: DRAM ECC enabled. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: F17h detected (node 2). Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC0 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC: UMC1 chip selects: Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: using x8 syndromes. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: MCT channel count: 2 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC MC2: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1a.3 Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: DRAM ECC enabled. Jun 18 12:09:18 fir-md1-s1 kernel: EDAC amd64: F17h detected (node 3). Jun 18 12:09:19 fir-md1-s1 kernel: EDAC MC: UMC0 chip selects: Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC MC: UMC1 chip selects: Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 0: 0MB 1: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 2: 32767MB 3: 32767MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 4: 0MB 5: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MC: 6: 0MB 7: 0MB Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: using x8 syndromes. Jun 18 12:09:19 fir-md1-s1 kernel: EDAC amd64: MCT channel count: 2 Jun 18 12:09:19 fir-md1-s1 kernel: EDAC MC3: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1b.3 Jun 18 12:09:19 fir-md1-s1 kernel: EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.0' (POLLED) Jun 18 12:09:20 fir-md1-s1 kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.3) Jun 18 12:09:36 fir-md1-s1 kernel: device-mapper: multipath round-robin: version 1.2.0 loaded Jun 18 12:10:03 fir-md1-s1 kernel: Adding 4194300k swap on /dev/sdk3. Priority:-2 extents:1 across:4194300k FS Jun 18 12:10:04 fir-md1-s1 kernel: FAT-fs (sdk1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. Jun 18 12:10:04 fir-md1-s1 kernel: type=1305 audit(1560885004.098:3): audit_pid=17942 old=0 auid=4294967295 ses=4294967295 res=1 Jun 18 12:10:04 fir-md1-s1 kernel: RPC: Registered named UNIX socket transport module. Jun 18 12:10:04 fir-md1-s1 kernel: RPC: Registered udp transport module. Jun 18 12:10:04 fir-md1-s1 kernel: RPC: Registered tcp transport module. Jun 18 12:10:04 fir-md1-s1 kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:04 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: slow_pci_heuristic:5202:(pid 18275): Max link speed = 100000, PCI BW = 126016 Jun 18 12:10:04 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) Jun 18 12:10:04 fir-md1-s1 kernel: mlx5_core 0000:84:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) Jun 18 12:10:05 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:05 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:05 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:05 fir-md1-s1 kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Jun 18 12:10:05 fir-md1-s1 kernel: tg3 0000:81:00.0: irq 257 for MSI/MSI-X Jun 18 12:10:05 fir-md1-s1 kernel: tg3 0000:81:00.0: irq 258 for MSI/MSI-X Jun 18 12:10:05 fir-md1-s1 kernel: tg3 0000:81:00.0: irq 259 for MSI/MSI-X Jun 18 12:10:05 fir-md1-s1 kernel: tg3 0000:81:00.0: irq 260 for MSI/MSI-X Jun 18 12:10:05 fir-md1-s1 kernel: tg3 0000:81:00.0: irq 261 for MSI/MSI-X Jun 18 12:10:05 fir-md1-s1 kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready Jun 18 12:10:09 fir-md1-s1 kernel: tg3 0000:81:00.0 em1: Link is up at 1000 Mbps, full duplex Jun 18 12:10:09 fir-md1-s1 kernel: tg3 0000:81:00.0 em1: Flow control is on for TX and on for RX Jun 18 12:10:09 fir-md1-s1 kernel: tg3 0000:81:00.0 em1: EEE is enabled Jun 18 12:10:09 fir-md1-s1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready Jun 18 12:10:09 fir-md1-s1 kernel: IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready Jun 18 12:10:09 fir-md1-s1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready Jun 18 12:10:14 fir-md1-s1 kernel: FS-Cache: Loaded Jun 18 12:10:14 fir-md1-s1 kernel: FS-Cache: Netfs 'nfs' registered for caching Jun 18 12:10:14 fir-md1-s1 kernel: Key type dns_resolver registered Jun 18 12:10:14 fir-md1-s1 kernel: NFS: Registering the id_resolver key type Jun 18 12:10:14 fir-md1-s1 kernel: Key type id_resolver registered Jun 18 12:10:14 fir-md1-s1 kernel: Key type id_legacy registered Jun 18 12:10:46 fir-md1-s1 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Jun 18 12:10:46 fir-md1-s1 kernel: alg: No test for adler32 (adler32-zlib) Jun 18 12:10:47 fir-md1-s1 kernel: Lustre: Lustre: Build Version: 2.12.0_10_g4f75199 Jun 18 12:10:47 fir-md1-s1 kernel: LNet: Using FastReg for registration Jun 18 12:10:47 fir-md1-s1 kernel: LNetError: 7269:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.201@o2ib7 on NA (ib0:0:10.0.10.51): bad dst nid 10.0.10.51@o2ib7 Jun 18 12:10:47 fir-md1-s1 kernel: LNet: Added LNI 10.0.10.51@o2ib7 [8/256/0/180] Jun 18 12:11:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect:321: MMP interval 42 higher than expected, please wait. Jun 18 12:12:00 fir-md1-s1 kernel: LDISKFS-fs (dm-0): recovery complete Jun 18 12:12:00 fir-md1-s1 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Jun 18 12:12:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fbd75c70-2700-1de1-4de7-0793c5782012 (at 0@lo) Jun 18 12:12:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c579ffa9-959a-5f2e-006d-9d0dfdb5fa5a (at 10.8.17.26@o2ib6) Jun 18 12:12:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7888454b-080b-6943-cf4c-416d31bde0ec (at 10.9.104.28@o2ib4) Jun 18 12:12:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Jun 18 12:12:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 18 12:12:14 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-3): ldiskfs_multi_mount_protect:321: MMP interval 42 higher than expected, please wait. Jun 18 12:12:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 323e9462-2806-288b-427b-09b4875db405 (at 10.0.10.52@o2ib7) Jun 18 12:12:16 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 18 12:12:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 14f72e02-ef06-defc-fe30-356a14ef5fda (at 10.9.109.29@o2ib4) Jun 18 12:12:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 18 12:12:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6fb1a9aa-6234-c00b-63b2-a1a72639773f (at 10.8.7.19@o2ib6) Jun 18 12:12:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 18 12:12:56 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Jun 18 12:12:57 fir-md1-s1 kernel: LDISKFS-fs (dm-3): recovery complete Jun 18 12:12:57 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Jun 18 12:12:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.2.34@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:12:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.2.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:12:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jun 18 12:12:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.9.104.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:12:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 18 12:13:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.67@o2ib6, removing former export from same NID Jun 18 12:13:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.8.37@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:13:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jun 18 12:13:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.30.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:13:07 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jun 18 12:13:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.113.9@o2ib4, removing former export from same NID Jun 18 12:13:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8990d314-6074-cf8b-1427-2287d94d8719 (at 10.8.23.29@o2ib6) Jun 18 12:13:14 fir-md1-s1 kernel: Lustre: Skipped 792 previous similar messages Jun 18 12:13:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.29@o2ib4, removing former export from same NID Jun 18 12:13:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:13:15 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jun 18 12:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.0.67@o2ib6 (not set up) Jun 18 12:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.108.60@o2ib4 (not set up) Jun 18 12:13:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 12:13:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.101.58@o2ib4 (not set up) Jun 18 12:13:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 18 12:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 Jun 18 12:13:27 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Jun 18 12:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1400 clients reconnect Jun 18 12:13:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.113.10@o2ib4, removing former export from same NID Jun 18 12:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:13:31 fir-md1-s1 kernel: LustreError: Skipped 505 previous similar messages Jun 18 12:13:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.104.14@o2ib4, removing former export from same NID Jun 18 12:13:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.104.28@o2ib4, removing former export from same NID Jun 18 12:13:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 18 12:14:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.19@o2ib6, removing former export from same NID Jun 18 12:14:04 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jun 18 12:14:05 fir-md1-s1 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 Jun 18 12:14:05 fir-md1-s1 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Jun 18 12:14:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.33@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 18 12:14:05 fir-md1-s1 kernel: LustreError: Skipped 939 previous similar messages Jun 18 12:14:06 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Jun 18 12:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery not enabled, recovery window 300-900 Jun 18 12:14:06 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Jun 18 12:14:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1399 clients reconnect Jun 18 12:14:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Denying connection for new client 1e1769d3-ffba-a4ec-e5e5-cf0cf094a85d(at 10.8.8.37@o2ib6), waiting for 1400 known clients (3 recovered, 1352 in progress, and 0 evicted) already passed deadline 0:49 Jun 18 12:14:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 62d964d5-e241-336a-a44f-d2f1a33459f3 (at 10.9.105.41@o2ib4) Jun 18 12:14:18 fir-md1-s1 kernel: Lustre: Skipped 2049 previous similar messages Jun 18 12:14:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jun 18 12:14:38 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jun 18 12:14:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:15, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:16, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:14:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 18 12:14:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:17, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:14:45 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 18 12:14:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:19, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:14:47 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 18 12:14:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:23, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:14:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jun 18 12:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:32, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:15:00 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jun 18 12:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 1:48, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:15:16 fir-md1-s1 kernel: Lustre: Skipped 166 previous similar messages Jun 18 12:15:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Denying connection for new client 1e1769d3-ffba-a4ec-e5e5-cf0cf094a85d(at 10.8.8.37@o2ib6), waiting for 1400 known clients (4 recovered, 1395 in progress, and 0 evicted) already passed deadline 2:04 Jun 18 12:15:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.19@o2ib6, removing former export from same NID Jun 18 12:15:44 fir-md1-s1 kernel: Lustre: Skipped 1374 previous similar messages Jun 18 12:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 2:21, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:15:49 fir-md1-s1 kernel: Lustre: Skipped 1094 previous similar messages Jun 18 12:15:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c579ffa9-959a-5f2e-006d-9d0dfdb5fa5a (at 10.8.17.26@o2ib6) in 229 seconds. I think it's dead, and I am evicting it. exp ffff8f453ecb7000, cur 1560885352 expire 1560885202 last 1560885123 Jun 18 12:16:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to aeb5fb0d-c687-a142-6a18-62fe99255a89 (at 10.8.30.8@o2ib6) Jun 18 12:16:26 fir-md1-s1 kernel: Lustre: Skipped 5929 previous similar messages Jun 18 12:16:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Denying connection for new client 1e1769d3-ffba-a4ec-e5e5-cf0cf094a85d(at 10.8.8.37@o2ib6), waiting for 1400 known clients (4 recovered, 1395 in progress, and 0 evicted) already passed deadline 3:20 Jun 18 12:16:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 3:25, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:16:53 fir-md1-s1 kernel: Lustre: Skipped 1632 previous similar messages Jun 18 12:17:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 00a27512-1ff4-ce80-3c1f-4cfb4021ea64 (at 10.8.31.9@o2ib6) in 229 seconds. I think it's dead, and I am evicting it. exp ffff8f14e8c40400, cur 1560885431 expire 1560885281 last 1560885202 Jun 18 12:17:11 fir-md1-s1 kernel: Lustre: Skipped 1312 previous similar messages Jun 18 12:17:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.106.26@o2ib4, removing former export from same NID Jun 18 12:17:54 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jun 18 12:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: recovery is timed out, evict stale exports Jun 18 12:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: disconnecting 1 stale clients Jun 18 12:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Denying connection for new client 1e1769d3-ffba-a4ec-e5e5-cf0cf094a85d(at 10.8.8.37@o2ib6), waiting for 1400 known clients (4 recovered, 1395 in progress, and 1 evicted) already passed deadline 4:00 Jun 18 12:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery already passed deadline 4:48, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Jun 18 12:19:01 fir-md1-s1 kernel: Lustre: Skipped 4511 previous similar messages Jun 18 12:19:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: recovery is timed out, evict stale exports Jun 18 12:19:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: disconnecting 1396 stale clients Jun 18 12:19:13 fir-md1-s1 kernel: LustreError: 20943:0:(tgt_grant.c:248:tgt_grant_sanity_check()) mdt_obd_disconnect: tot_granted 2097152 != fo_tot_granted 94371840 Jun 18 12:19:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 5:01, of 1400 clients 4 recovered and 1396 were evicted. Jun 18 12:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: recovery is timed out, evict stale exports Jun 18 12:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: disconnecting 1395 stale clients Jun 18 12:19:28 fir-md1-s1 kernel: LustreError: 20712:0:(tgt_grant.c:248:tgt_grant_sanity_check()) mdt_obd_disconnect: tot_granted 2097152 != fo_tot_granted 89374720 Jun 18 12:19:28 fir-md1-s1 kernel: LustreError: 20712:0:(tgt_grant.c:248:tgt_grant_sanity_check()) Skipped 43 previous similar messages Jun 18 12:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 6:01, of 1400 clients 4 recovered and 1396 were evicted. Jun 18 12:19:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8f150adbd000, cur 1560885585 expire 1560885435 last 1560885353 Jun 18 12:19:45 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jun 18 12:20:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 33d1aa9e-637b-a4b6-149b-4554121b9703 (at 10.9.109.56@o2ib4) reconnecting Jun 18 12:20:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 825512d6-7433-1c74-485b-b1a59d9ea8c8 (at 10.8.8.34@o2ib6) reconnecting Jun 18 12:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 564d73ec-5593-7fd1-5465-b4305978ee16 (at 10.8.17.8@o2ib6) reconnecting Jun 18 12:20:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 12:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c24b8bff-f99c-4849-767d-bb11ab7dd32c (at 10.9.104.34@o2ib4) reconnecting Jun 18 12:20:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 18 12:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2cc0bc1b-7a1f-9dab-b36c-c6206a02385d (at 10.8.20.20@o2ib6) reconnecting Jun 18 12:20:28 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 18 12:20:34 fir-md1-s1 kernel: LustreError: 21498:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 49152 Jun 18 12:20:34 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 118784 Jun 18 12:20:34 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 18 12:20:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9dfc2bda-cf66-13a5-c506-30cd55e4267b (at 10.9.108.17@o2ib4) reconnecting Jun 18 12:20:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 18 12:20:39 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 18 12:20:39 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 26 previous similar messages Jun 18 12:20:41 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 18 12:20:41 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 33 previous similar messages Jun 18 12:20:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 92d3fcc8-c103-27a1-5dc1-c88d1de34211 (at 10.8.12.30@o2ib6) Jun 18 12:20:42 fir-md1-s1 kernel: Lustre: Skipped 12151 previous similar messages Jun 18 12:20:49 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 18 12:20:49 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 263 previous similar messages Jun 18 12:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 18 12:20:55 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jun 18 12:20:59 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 12:20:59 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 70 previous similar messages Jun 18 12:21:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8257ad81-12d5-f269-3c44-478c2a180d99 (at 10.8.17.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14b3de2000, cur 1560885663 expire 1560885513 last 1560885436 Jun 18 12:21:03 fir-md1-s1 kernel: Lustre: Skipped 1302 previous similar messages Jun 18 12:21:15 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 18 12:21:15 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jun 18 12:21:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2f3fe604-ebdc-987d-cf70-34fded524b5d (at 10.8.21.23@o2ib6) reconnecting Jun 18 12:21:27 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jun 18 12:21:49 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 12:21:49 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 53 previous similar messages Jun 18 12:22:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.18@o2ib6, removing former export from same NID Jun 18 12:22:11 fir-md1-s1 kernel: Lustre: Skipped 1681 previous similar messages Jun 18 12:22:54 fir-md1-s1 kernel: LustreError: 21538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 12:22:54 fir-md1-s1 kernel: LustreError: 21538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 257 previous similar messages Jun 18 12:25:05 fir-md1-s1 kernel: LustreError: 21541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 18 12:25:05 fir-md1-s1 kernel: LustreError: 21541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 223 previous similar messages Jun 18 12:29:29 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 12:29:29 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 230 previous similar messages Jun 18 12:30:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 18 12:30:01 fir-md1-s1 kernel: Lustre: Skipped 630 previous similar messages Jun 18 12:38:02 fir-md1-s1 kernel: LustreError: 21743:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 18 12:38:02 fir-md1-s1 kernel: LustreError: 21743:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 857 previous similar messages Jun 18 12:48:03 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 12:48:03 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1173 previous similar messages Jun 18 12:58:05 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 12:58:05 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 496 previous similar messages Jun 18 13:08:05 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 13:08:05 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 254 previous similar messages Jun 18 13:08:56 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 13:18:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 13:18:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jun 18 13:28:17 fir-md1-s1 kernel: LustreError: 21793:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 13:28:17 fir-md1-s1 kernel: LustreError: 21793:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 18 13:38:21 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 18 13:38:21 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 484 previous similar messages Jun 18 13:48:26 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 18 13:48:26 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 18 13:52:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 18 13:52:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 18 13:52:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 18 13:52:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jun 18 13:58:29 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 13:58:29 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 388 previous similar messages Jun 18 14:07:21 fir-md1-s1 kernel: perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 Jun 18 14:08:30 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 14:08:30 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 320 previous similar messages Jun 18 14:19:38 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 14:19:38 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 222 previous similar messages Jun 18 14:29:44 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 14:29:44 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 191 previous similar messages Jun 18 14:35:28 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 14:39:55 fir-md1-s1 kernel: LustreError: 21566:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 18 14:39:55 fir-md1-s1 kernel: LustreError: 21566:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 283 previous similar messages Jun 18 14:49:55 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 18 14:49:55 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 205 previous similar messages Jun 18 15:00:06 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 15:00:06 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jun 18 15:10:07 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 15:10:07 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 294 previous similar messages Jun 18 15:18:25 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 15:20:07 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 15:20:07 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 18 15:22:44 fir-md1-s1 kernel: perf: interrupt took too long (3155 > 3126), lowering kernel.perf_event_max_sample_rate to 63000 Jun 18 15:30:15 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 15:30:15 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 18 15:33:04 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 15:37:41 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 15:40:18 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 15:40:18 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 388 previous similar messages Jun 18 15:43:59 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 15:43:59 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 18 15:46:02 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 15:50:41 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 18 15:50:41 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 421 previous similar messages Jun 18 16:00:44 fir-md1-s1 kernel: LustreError: 21794:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 18 16:00:44 fir-md1-s1 kernel: LustreError: 21794:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 332 previous similar messages Jun 18 16:01:14 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:01:14 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 18 16:05:43 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:10:50 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 16:10:50 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 462 previous similar messages Jun 18 16:20:52 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 16:20:52 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 379 previous similar messages Jun 18 16:28:42 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:30:54 fir-md1-s1 kernel: LustreError: 22973:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 16:30:54 fir-md1-s1 kernel: LustreError: 22973:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 18 16:34:28 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:35:17 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:38:37 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:40:00 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:40:57 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 16:40:57 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 358 previous similar messages Jun 18 16:43:37 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:43:37 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 18 16:46:33 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 16:50:59 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 16:50:59 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 375 previous similar messages Jun 18 16:51:37 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:01:01 fir-md1-s1 kernel: LustreError: 22973:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 17:01:01 fir-md1-s1 kernel: LustreError: 22973:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 285 previous similar messages Jun 18 17:04:42 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:05:23 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:05:23 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 18 17:07:27 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:07:27 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 18 17:08:25 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:09:38 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:09:38 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 18 17:11:05 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 17:11:05 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 252 previous similar messages Jun 18 17:19:55 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:19:55 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 18 17:21:22 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 17:21:22 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jun 18 17:28:53 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:28:53 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 18 17:29:10 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 18 17:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 545f12c1-4799-a254-b9c4-f75f43e1bc5b (at 10.8.27.23@o2ib6) reconnecting Jun 18 17:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f2f779cf-d459-667d-6b56-c14a76db50bb (at 10.8.27.23@o2ib6) Jun 18 17:29:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 17:29:35 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 18 17:29:35 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jun 18 17:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 545f12c1-4799-a254-b9c4-f75f43e1bc5b (at 10.8.27.23@o2ib6) reconnecting Jun 18 17:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 29b52eb8-dab6-4b88-7a0d-057d59d63b47 (at 10.8.17.22@o2ib6) Jun 18 17:29:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 17:31:47 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 17:31:47 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 195 previous similar messages Jun 18 17:38:35 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:38:35 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Jun 18 17:41:57 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 17:41:57 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 223 previous similar messages Jun 18 17:46:25 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 18 17:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 564d73ec-5593-7fd1-5465-b4305978ee16 (at 10.8.17.8@o2ib6) reconnecting Jun 18 17:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5132afab-5b1d-c7e5-9316-17cfeee10d24 (at 10.8.17.8@o2ib6) Jun 18 17:46:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 17:48:41 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:48:41 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jun 18 17:51:57 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 18 17:51:57 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1009 previous similar messages Jun 18 17:59:30 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 17:59:30 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jun 18 18:02:07 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 18:02:07 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1677 previous similar messages Jun 18 18:10:18 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 18:10:18 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jun 18 18:12:09 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 18:12:09 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2381 previous similar messages Jun 18 18:21:54 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 18:21:54 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages Jun 18 18:22:10 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 18:22:10 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1508 previous similar messages Jun 18 18:31:56 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 18:31:56 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jun 18 18:32:10 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 18:32:10 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 819 previous similar messages Jun 18 18:42:14 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 18:42:14 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 974 previous similar messages Jun 18 18:42:38 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 18:42:38 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jun 18 18:52:14 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 18:52:14 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22541 previous similar messages Jun 18 18:53:10 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 18:53:10 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 18 19:02:16 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 19:02:16 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 39085 previous similar messages Jun 18 19:04:06 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 19:04:06 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 75 previous similar messages Jun 18 19:12:16 fir-md1-s1 kernel: LustreError: 27482:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 19:12:16 fir-md1-s1 kernel: LustreError: 27482:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1486 previous similar messages Jun 18 19:16:32 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 19:16:32 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages Jun 18 19:22:20 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 19:22:20 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 976 previous similar messages Jun 18 19:27:34 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1509cf8050 x1636444251328064/t0(0) o3->2ca8c1ab-ca57-7d24-398b-275ee2691945@10.9.112.16@o2ib4:9/0 lens 488/440 e 0 to 0 dl 1560911259 ref 2 fl Interpret:/0/0 rc 0/0 Jun 18 19:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2ca8c1ab-ca57-7d24-398b-275ee2691945 (at 10.9.112.16@o2ib4) reconnecting Jun 18 19:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 286d4aef-dd39-033a-885a-1b2f68dad8ee (at 10.9.112.16@o2ib4) Jun 18 19:27:56 fir-md1-s1 kernel: LustreError: 20500:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1509cf8050 x1636444251328064/t0(0) o3->2ca8c1ab-ca57-7d24-398b-275ee2691945@10.9.112.16@o2ib4:9/0 lens 488/440 e 0 to 0 dl 1560911259 ref 1 fl Interpret:/0/0 rc 0/0 Jun 18 19:27:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 2ca8c1ab-ca57-7d24-398b-275ee2691945 (at 10.9.112.16@o2ib4), client will retry: rc -107 Jun 18 19:27:56 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:17s); client may timeout. req@ffff8f1509cf8050 x1636444251328064/t0(0) o3->2ca8c1ab-ca57-7d24-398b-275ee2691945@10.9.112.16@o2ib4:9/0 lens 488/440 e 0 to 0 dl 1560911259 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jun 18 19:28:14 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 19:28:14 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages Jun 18 19:32:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 19:32:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1433 previous similar messages Jun 18 19:38:27 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 19:38:27 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 62 previous similar messages Jun 18 19:42:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 19:42:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1787 previous similar messages Jun 18 19:49:32 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 19:49:32 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 121 previous similar messages Jun 18 19:52:32 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 18 19:52:32 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 57799 previous similar messages Jun 18 20:00:22 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:00:22 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Jun 18 20:03:16 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 20:03:16 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3013 previous similar messages Jun 18 20:10:56 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:10:56 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 27 previous similar messages Jun 18 20:13:18 fir-md1-s1 kernel: LustreError: 22058:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 20:13:18 fir-md1-s1 kernel: LustreError: 22058:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 18 20:23:30 fir-md1-s1 kernel: LustreError: 21794:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 20:23:30 fir-md1-s1 kernel: LustreError: 21794:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 682 previous similar messages Jun 18 20:24:00 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:24:00 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 49 previous similar messages Jun 18 20:33:30 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 20:33:30 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 995 previous similar messages Jun 18 20:35:16 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:35:16 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages Jun 18 20:43:34 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 20:43:34 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 863 previous similar messages Jun 18 20:46:28 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:46:28 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 62 previous similar messages Jun 18 20:53:44 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 20:53:44 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 430 previous similar messages Jun 18 20:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a1402e9b-5e48-acd3-204d-e410e8c1eb0b (at 10.8.2.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f350405e800, cur 1560916663 expire 1560916513 last 1560916436 Jun 18 20:57:43 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 18 20:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a1402e9b-5e48-acd3-204d-e410e8c1eb0b (at 10.8.2.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f350405f400, cur 1560916666 expire 1560916516 last 1560916439 Jun 18 20:57:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 18 20:58:25 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 20:58:25 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 80 previous similar messages Jun 18 21:03:45 fir-md1-s1 kernel: LustreError: 23107:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 21:03:45 fir-md1-s1 kernel: LustreError: 23107:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 529 previous similar messages Jun 18 21:08:51 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 21:08:51 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 93 previous similar messages Jun 18 21:13:45 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 21:13:45 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 330 previous similar messages Jun 18 21:18:52 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 21:18:52 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages Jun 18 21:23:46 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 21:23:46 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2183 previous similar messages Jun 18 21:31:54 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 21:31:54 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 41 previous similar messages Jun 18 21:33:46 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 21:33:46 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 563 previous similar messages Jun 18 21:43:50 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 18 21:43:50 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 775 previous similar messages Jun 18 21:44:42 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 21:44:42 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 105 previous similar messages Jun 18 21:53:54 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 21:53:54 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 810 previous similar messages Jun 18 21:55:16 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 21:55:16 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 67 previous similar messages Jun 18 22:03:55 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 22:03:55 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43806 previous similar messages Jun 18 22:08:03 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:08:03 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 378 previous similar messages Jun 18 22:13:58 fir-md1-s1 kernel: LustreError: 22058:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 22:13:58 fir-md1-s1 kernel: LustreError: 22058:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15736 previous similar messages Jun 18 22:18:12 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:18:12 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 264 previous similar messages Jun 18 22:24:02 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 94208 GRANT, real grant 0 Jun 18 22:24:02 fir-md1-s1 kernel: LustreError: 21792:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 310 previous similar messages Jun 18 22:28:15 fir-md1-s1 kernel: Lustre: 20457:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:28:15 fir-md1-s1 kernel: Lustre: 20457:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 138 previous similar messages Jun 18 22:34:26 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 22:34:26 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 314 previous similar messages Jun 18 22:38:16 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:38:16 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 284 previous similar messages Jun 18 22:44:50 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 22:44:50 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 221 previous similar messages Jun 18 22:48:48 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:48:48 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 146 previous similar messages Jun 18 22:54:56 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 22:54:56 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 295 previous similar messages Jun 18 22:58:51 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 22:58:51 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 219 previous similar messages Jun 18 23:05:07 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 23:05:07 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 336 previous similar messages Jun 18 23:13:24 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 23:13:24 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 142 previous similar messages Jun 18 23:15:58 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 23:15:58 fir-md1-s1 kernel: LustreError: 21684:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 220 previous similar messages Jun 18 23:26:01 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 23:26:01 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 203 previous similar messages Jun 18 23:26:16 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 23:26:16 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages Jun 18 23:36:03 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 18 23:36:03 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 265 previous similar messages Jun 18 23:37:49 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 23:37:49 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 81 previous similar messages Jun 18 23:46:07 fir-md1-s1 kernel: LustreError: 20507:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 18 23:46:07 fir-md1-s1 kernel: LustreError: 20507:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jun 18 23:48:55 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 18 23:48:55 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 148 previous similar messages Jun 18 23:56:23 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 18 23:56:23 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 288 previous similar messages Jun 19 00:00:26 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 00:00:26 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 255 previous similar messages Jun 19 00:06:28 fir-md1-s1 kernel: LustreError: 21450:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:06:28 fir-md1-s1 kernel: LustreError: 21450:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 269 previous similar messages Jun 19 00:12:19 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 00:12:19 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 404 previous similar messages Jun 19 00:16:37 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:16:37 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 312 previous similar messages Jun 19 00:26:00 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 00:26:00 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 438 previous similar messages Jun 19 00:26:38 fir-md1-s1 kernel: LustreError: 21741:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:26:38 fir-md1-s1 kernel: LustreError: 21741:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 264 previous similar messages Jun 19 00:36:40 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:36:40 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 276 previous similar messages Jun 19 00:37:46 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 00:37:46 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 638 previous similar messages Jun 19 00:41:45 fir-md1-s1 kernel: perf: interrupt took too long (3949 > 3943), lowering kernel.perf_event_max_sample_rate to 50000 Jun 19 00:46:42 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:46:42 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 299 previous similar messages Jun 19 00:50:39 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 00:50:39 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 281 previous similar messages Jun 19 00:56:46 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 00:56:46 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 340 previous similar messages Jun 19 01:00:48 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:00:48 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 251 previous similar messages Jun 19 01:06:51 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 01:06:51 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1389 previous similar messages Jun 19 01:11:02 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:11:02 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 460 previous similar messages Jun 19 01:16:54 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 01:16:54 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 318 previous similar messages Jun 19 01:21:14 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:21:14 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 211 previous similar messages Jun 19 01:27:05 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 01:27:05 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 706 previous similar messages Jun 19 01:31:20 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:31:20 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 269 previous similar messages Jun 19 01:37:18 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 01:37:18 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 762 previous similar messages Jun 19 01:44:13 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:44:13 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 295 previous similar messages Jun 19 01:47:31 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 01:47:31 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 758 previous similar messages Jun 19 01:57:17 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 01:57:17 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 179 previous similar messages Jun 19 01:57:38 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 01:57:38 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 735 previous similar messages Jun 19 02:07:40 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 02:07:40 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 723 previous similar messages Jun 19 02:10:21 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 02:10:21 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 272 previous similar messages Jun 19 02:17:50 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 02:17:50 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 790 previous similar messages Jun 19 02:23:04 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 02:23:04 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 235 previous similar messages Jun 19 02:27:53 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 02:27:53 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1832 previous similar messages Jun 19 02:35:38 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 02:35:38 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 553 previous similar messages Jun 19 02:37:53 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 02:37:53 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 720 previous similar messages Jun 19 02:46:59 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 02:46:59 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 399 previous similar messages Jun 19 02:48:04 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 122880 GRANT, real grant 0 Jun 19 02:48:04 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 595 previous similar messages Jun 19 02:58:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 65536 GRANT, real grant 0 Jun 19 02:58:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 181 previous similar messages Jun 19 02:58:20 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 02:58:20 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 297 previous similar messages Jun 19 03:08:20 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 19 03:08:20 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 334 previous similar messages Jun 19 03:08:31 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:08:31 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 168 previous similar messages Jun 19 03:18:22 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 03:18:22 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1471 previous similar messages Jun 19 03:18:40 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:18:40 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 168 previous similar messages Jun 19 03:28:27 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 03:28:27 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 19 03:28:42 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:28:42 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 163 previous similar messages Jun 19 03:38:32 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 03:38:32 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 718 previous similar messages Jun 19 03:39:02 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:39:02 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 231 previous similar messages Jun 19 03:48:36 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 03:48:36 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1046 previous similar messages Jun 19 03:49:04 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:49:04 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 714 previous similar messages Jun 19 03:58:37 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 94208 GRANT, real grant 0 Jun 19 03:58:37 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 929 previous similar messages Jun 19 03:59:08 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 03:59:08 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 593 previous similar messages Jun 19 04:08:38 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 04:08:38 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1779 previous similar messages Jun 19 04:09:26 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:09:26 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 282 previous similar messages Jun 19 04:18:44 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 04:18:44 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2222 previous similar messages Jun 19 04:19:32 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:19:32 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 635 previous similar messages Jun 19 04:28:56 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 04:28:56 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 776 previous similar messages Jun 19 04:29:38 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:29:38 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 565 previous similar messages Jun 19 04:38:57 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 04:38:57 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1076 previous similar messages Jun 19 04:39:43 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:39:43 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 157 previous similar messages Jun 19 04:49:04 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 04:49:04 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1406 previous similar messages Jun 19 04:49:50 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:49:50 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 403 previous similar messages Jun 19 04:59:13 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 04:59:13 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1324 previous similar messages Jun 19 04:59:58 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 04:59:58 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 207 previous similar messages Jun 19 05:09:17 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 19 05:09:17 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1180 previous similar messages Jun 19 05:10:52 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 05:10:52 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 270 previous similar messages Jun 19 05:19:19 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 05:19:19 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 957 previous similar messages Jun 19 05:20:54 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 05:20:54 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 455 previous similar messages Jun 19 05:29:27 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 05:29:27 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1314 previous similar messages Jun 19 05:31:00 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 05:31:00 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 297 previous similar messages Jun 19 05:39:28 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 05:39:28 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1375 previous similar messages Jun 19 05:41:05 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 05:41:05 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 532 previous similar messages Jun 19 05:49:34 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 05:49:34 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1322 previous similar messages Jun 19 05:51:06 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 05:51:06 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 399 previous similar messages Jun 19 05:59:41 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 05:59:41 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1197 previous similar messages Jun 19 06:01:23 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:01:23 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 480 previous similar messages Jun 19 06:09:44 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 06:09:44 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1182 previous similar messages Jun 19 06:11:38 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:11:38 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 214 previous similar messages Jun 19 06:19:45 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 06:19:45 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 893 previous similar messages Jun 19 06:21:38 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:21:38 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 597 previous similar messages Jun 19 06:29:50 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 19 06:29:50 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 560 previous similar messages Jun 19 06:31:59 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:31:59 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 361 previous similar messages Jun 19 06:39:51 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 06:39:51 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 824 previous similar messages Jun 19 06:42:05 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:42:05 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 274 previous similar messages Jun 19 06:49:52 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 06:49:52 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 828 previous similar messages Jun 19 06:52:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 06:52:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 227 previous similar messages Jun 19 06:59:53 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 147456 GRANT, real grant 0 Jun 19 06:59:53 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1021 previous similar messages Jun 19 07:02:31 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 07:02:31 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 175 previous similar messages Jun 19 07:09:54 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 151552 GRANT, real grant 0 Jun 19 07:09:54 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 989 previous similar messages Jun 19 07:12:37 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 07:12:37 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 242 previous similar messages Jun 19 07:19:54 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 07:19:54 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1053 previous similar messages Jun 19 07:22:42 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 07:22:42 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 138 previous similar messages Jun 19 07:29:54 fir-md1-s1 kernel: LustreError: 21450:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 07:29:54 fir-md1-s1 kernel: LustreError: 21450:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1027 previous similar messages Jun 19 07:39:28 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 07:39:28 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 48 previous similar messages Jun 19 07:39:55 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 07:39:55 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 894 previous similar messages Jun 19 07:50:05 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 07:50:05 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 792 previous similar messages Jun 19 07:50:56 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 07:50:56 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 246 previous similar messages Jun 19 08:00:08 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 08:00:08 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 828 previous similar messages Jun 19 08:02:46 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:02:46 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages Jun 19 08:10:09 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 08:10:09 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1088 previous similar messages Jun 19 08:13:05 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:13:05 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 143 previous similar messages Jun 19 08:20:18 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 08:20:18 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1292 previous similar messages Jun 19 08:23:22 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:23:22 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 301 previous similar messages Jun 19 08:30:22 fir-md1-s1 kernel: LustreError: 23107:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 08:30:22 fir-md1-s1 kernel: LustreError: 23107:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 34339 previous similar messages Jun 19 08:33:57 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:33:57 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jun 19 08:40:22 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 122880 GRANT, real grant 0 Jun 19 08:40:22 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 26179 previous similar messages Jun 19 08:45:39 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:45:39 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 60 previous similar messages Jun 19 08:50:22 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 08:50:22 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1185 previous similar messages Jun 19 08:56:44 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 08:56:44 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Jun 19 09:00:24 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 19 09:00:24 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 301 previous similar messages Jun 19 09:09:07 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 09:09:07 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 30 previous similar messages Jun 19 09:10:27 fir-md1-s1 kernel: LustreError: 22434:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 09:10:27 fir-md1-s1 kernel: LustreError: 22434:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 197 previous similar messages Jun 19 09:20:28 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 09:20:28 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 294 previous similar messages Jun 19 09:26:05 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 09:26:05 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 39 previous similar messages Jun 19 09:30:44 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 09:30:44 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 219 previous similar messages Jun 19 09:38:09 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 09:38:09 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 19 09:40:51 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 09:40:51 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 256 previous similar messages Jun 19 09:48:24 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 09:48:24 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 49 previous similar messages Jun 19 09:51:04 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 09:51:04 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 286 previous similar messages Jun 19 10:01:07 fir-md1-s1 kernel: LustreError: 22975:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 10:01:07 fir-md1-s1 kernel: LustreError: 22975:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2282 previous similar messages Jun 19 10:06:06 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 10:06:06 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Jun 19 10:11:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 19 10:11:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 475 previous similar messages Jun 19 10:16:14 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 10:16:14 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 161 previous similar messages Jun 19 10:21:14 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 19 10:21:14 fir-md1-s1 kernel: LustreError: 21742:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 515 previous similar messages Jun 19 10:26:18 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 10:26:18 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jun 19 10:31:14 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 10:31:14 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 567 previous similar messages Jun 19 10:38:19 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 10:38:19 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jun 19 10:41:17 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 10:41:17 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 19 10:51:20 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 10:51:20 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 557 previous similar messages Jun 19 10:55:45 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 10:55:45 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 19 11:01:20 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 11:01:20 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 512 previous similar messages Jun 19 11:08:56 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 11:08:56 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 58 previous similar messages Jun 19 11:11:28 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 11:11:28 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 571 previous similar messages Jun 19 11:21:28 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 11:21:28 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 711 previous similar messages Jun 19 11:21:45 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 11:21:45 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jun 19 11:31:30 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 19 11:31:30 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 814 previous similar messages Jun 19 11:31:47 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 11:31:47 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 44 previous similar messages Jun 19 11:41:35 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 11:41:35 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 556 previous similar messages Jun 19 11:43:12 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 11:43:12 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jun 19 11:51:39 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 11:51:39 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 388 previous similar messages Jun 19 11:56:27 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 11:56:27 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 19 12:01:54 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 12:01:54 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 400 previous similar messages Jun 19 12:11:57 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 45056 GRANT, real grant 0 Jun 19 12:11:57 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 393 previous similar messages Jun 19 12:22:04 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 12:22:04 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 19 12:32:07 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 12:32:07 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 308 previous similar messages Jun 19 12:42:07 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 12:42:07 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 288 previous similar messages Jun 19 12:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 69994cc7-6cad-e493-9816-76214dd8e291 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2502e19800, cur 1560973672 expire 1560973522 last 1560973445 Jun 19 12:52:07 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 12:52:07 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 329 previous similar messages Jun 19 13:02:09 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 19 13:02:09 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 759 previous similar messages Jun 19 13:12:09 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 13:12:09 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1401 previous similar messages Jun 19 13:22:12 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 13:22:12 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1397 previous similar messages Jun 19 13:32:48 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 32768 GRANT, real grant 0 Jun 19 13:32:48 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1165 previous similar messages Jun 19 13:42:48 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 19 13:42:48 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 573 previous similar messages Jun 19 13:52:55 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 13:52:55 fir-md1-s1 kernel: LustreError: 20510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1035 previous similar messages Jun 19 14:02:57 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 19 14:02:57 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 124 previous similar messages Jun 19 14:12:58 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 65536 GRANT, real grant 0 Jun 19 14:12:58 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 516 previous similar messages Jun 19 14:23:01 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 14:23:01 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 19 14:34:43 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 19 14:34:43 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 673 previous similar messages Jun 19 14:44:48 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 14:44:48 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 19 14:54:51 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 14:54:51 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 514 previous similar messages Jun 19 15:04:51 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 15:04:51 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 752 previous similar messages Jun 19 15:15:37 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 19 15:15:37 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1957 previous similar messages Jun 19 15:26:01 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 15:26:01 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 19 15:36:17 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 19 15:36:17 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 545 previous similar messages Jun 19 15:46:21 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 110592 GRANT, real grant 0 Jun 19 15:46:21 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16424 previous similar messages Jun 19 16:03:49 fir-md1-s1 kernel: LustreError: 21566:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 19 16:03:49 fir-md1-s1 kernel: LustreError: 21566:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 42796 previous similar messages Jun 19 17:03:09 fir-md1-s1 kernel: sd 0:0:3:1: Inquiry data has changed Jun 19 17:03:21 fir-md1-s1 kernel: sd 0:0:3:1: Inquiry data has changed Jun 19 17:03:28 fir-md1-s1 kernel: sd 0:0:1:0: Inquiry data has changed Jun 19 17:03:38 fir-md1-s1 kernel: sd 0:0:1:0: Inquiry data has changed Jun 19 17:10:04 fir-md1-s1 kernel: sd 0:0:1:1: Inquiry data has changed Jun 19 17:10:04 fir-md1-s1 kernel: sd 0:0:1:2: Inquiry data has changed Jun 19 17:10:04 fir-md1-s1 kernel: sd 0:0:3:0: Inquiry data has changed Jun 19 17:10:04 fir-md1-s1 kernel: sd 0:0:3:2: Inquiry data has changed Jun 19 18:44:49 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 18:44:49 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jun 19 20:46:00 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 20:47:02 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 19 20:47:02 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jun 19 22:32:13 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 22:32:13 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jun 19 22:33:56 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 22:33:56 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 19 22:37:21 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 22:57:44 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 22:57:44 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 19 23:15:19 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 19 23:15:19 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 19 23:15:26 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 19 23:16:22 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 19 23:49:12 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 00:57:10 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 00:57:10 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 20 00:57:16 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 00:57:16 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jun 20 00:57:22 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 00:57:39 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 00:57:39 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 55 previous similar messages Jun 20 00:58:01 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 110592 GRANT, real grant 0 Jun 20 00:58:01 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 797 previous similar messages Jun 20 00:58:53 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 20 00:58:53 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 165 previous similar messages Jun 20 01:01:26 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 20 01:01:26 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 20 01:16:09 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 01:16:37 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 01:16:37 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 20 01:17:19 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 20 01:17:19 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 57 previous similar messages Jun 20 01:18:39 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 20 01:18:39 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 107 previous similar messages Jun 20 01:21:12 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 01:21:12 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 78 previous similar messages Jun 20 01:26:17 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 01:26:17 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 84 previous similar messages Jun 20 01:36:23 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 01:36:23 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 489 previous similar messages Jun 20 01:47:08 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 01:47:08 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 443 previous similar messages Jun 20 01:57:14 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 01:57:14 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 473 previous similar messages Jun 20 02:07:14 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 02:07:14 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 451 previous similar messages Jun 20 02:17:18 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 02:17:18 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 441 previous similar messages Jun 20 02:27:53 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 02:27:53 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1629 previous similar messages Jun 20 02:37:58 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 20 02:37:58 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 498 previous similar messages Jun 20 02:47:59 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 02:47:59 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 399 previous similar messages Jun 20 02:58:04 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 02:58:04 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 46 previous similar messages Jun 20 03:08:24 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 03:08:24 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jun 20 03:18:26 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 03:18:26 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1300 previous similar messages Jun 20 03:22:00 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026113/real 1561026113] req@ffff8f1c772f4800 x1636708563212800/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026120 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 20 03:22:07 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026120/real 1561026120] req@ffff8f1c772f4800 x1636708563212800/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026127 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:22:07 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 20 03:22:08 fir-md1-s1 kernel: Lustre: 21482:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e057f0000 x1636580163248912/t0(0) o101->804bb2d0-a656-6c01-b0db-5b53058fb0f9@10.8.9.9@o2ib6:13/0 lens 480/568 e 1 to 0 dl 1561026133 ref 2 fl Interpret:/0/0 rc 0/0 Jun 20 03:22:14 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026127/real 1561026127] req@ffff8f24c36c2700 x1636708563213008/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026134 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:22:14 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 20 03:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6) reconnecting Jun 20 03:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 220a94f1-3873-c0d2-13c3-2a8b3b58132e (at 10.8.29.8@o2ib6) Jun 20 03:22:21 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026134/real 1561026134] req@ffff8f1c772f4800 x1636708563212800/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026141 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:22:21 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 20 03:22:28 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026141/real 1561026141] req@ffff8f24c36c2700 x1636708563213008/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026148 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:22:28 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jun 20 03:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:22:42 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026155/real 1561026155] req@ffff8f1c772f4800 x1636708563212800/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026162 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:22:42 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 20 03:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:22:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:22:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:03 fir-md1-s1 kernel: Lustre: 22289:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026176/real 1561026176] req@ffff8f1e30743c00 x1636708563213088/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026183 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:23:03 fir-md1-s1 kernel: Lustre: 22289:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jun 20 03:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:23:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:23:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026211/real 1561026211] req@ffff8f24c36c2700 x1636708563213008/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026218 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:23:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:23:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:23:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 03:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) reconnecting Jun 20 03:24:41 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 20 03:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 20 03:24:41 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 20 03:24:48 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561026281/real 1561026281] req@ffff8f1c772f4800 x1636708563212800/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561026288 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 20 03:24:48 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 27 previous similar messages Jun 20 03:24:48 fir-md1-s1 kernel: LustreError: 20460:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from glimpse AST (req@ffff8f24c36c2700 x1636708563213008 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2376712f40/0x5d9ee61d1db84dae lrc: 4/0,0 mode: PW/PW res: [0x200025f94:0x1628f:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0xb7f6b0a5194d419f expref: 59 pid: 21433 timeout: 0 lvb_type: 0 Jun 20 03:24:48 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Jun 20 03:24:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 197s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1330a91680/0x5d9ee61d1db8506a lrc: 4/0,0 mode: PW/PW res: [0x200025b09:0x2431:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0xb7f6b0a5194d444d expref: 60 pid: 26257 timeout: 0 lvb_type: 0 Jun 20 03:24:48 fir-md1-s1 kernel: LustreError: 20460:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages Jun 20 03:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2a12b0b1-96b1-b609-eece-2f0222928c53 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2507b0a000, cur 1561026330 expire 1561026180 last 1561026103 Jun 20 03:25:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 20 03:28:39 fir-md1-s1 kernel: LustreError: 21710:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 86016 GRANT, real grant 0 Jun 20 03:28:39 fir-md1-s1 kernel: LustreError: 21710:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 275 previous similar messages Jun 20 03:38:49 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 03:38:49 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 20 03:48:54 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 03:48:54 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 760 previous similar messages Jun 20 03:58:56 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 03:58:56 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 716 previous similar messages Jun 20 04:08:56 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 20 04:08:56 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2561 previous similar messages Jun 20 04:18:57 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 04:18:57 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 706 previous similar messages Jun 20 04:28:58 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 04:28:58 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 577 previous similar messages Jun 20 04:39:06 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 04:39:06 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 919 previous similar messages Jun 20 04:49:14 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 04:49:14 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1034 previous similar messages Jun 20 04:59:17 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 04:59:17 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1061 previous similar messages Jun 20 05:09:20 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 20 05:09:20 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 923 previous similar messages Jun 20 05:19:34 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 20 05:19:34 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 825 previous similar messages Jun 20 05:29:36 fir-md1-s1 kernel: LustreError: 21710:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 05:29:36 fir-md1-s1 kernel: LustreError: 21710:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1005 previous similar messages Jun 20 05:39:40 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 20 05:39:40 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 743 previous similar messages Jun 20 05:49:44 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 05:49:44 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 959 previous similar messages Jun 20 05:59:50 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 05:59:50 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 781 previous similar messages Jun 20 06:09:57 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 06:09:57 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1045 previous similar messages Jun 20 06:19:58 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 118784 GRANT, real grant 0 Jun 20 06:19:58 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 892 previous similar messages Jun 20 06:30:06 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 06:30:06 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 640 previous similar messages Jun 20 06:40:11 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 20 06:40:11 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jun 20 06:50:16 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 06:50:16 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 494 previous similar messages Jun 20 07:00:22 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 07:00:22 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 542 previous similar messages Jun 20 07:10:25 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 07:10:25 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 727 previous similar messages Jun 20 07:20:27 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 07:20:27 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 776 previous similar messages Jun 20 07:30:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 07:30:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 700 previous similar messages Jun 20 07:40:33 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 07:40:33 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 687 previous similar messages Jun 20 07:50:39 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 135168 GRANT, real grant 0 Jun 20 07:50:39 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 732 previous similar messages Jun 20 08:00:43 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 08:00:43 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 536 previous similar messages Jun 20 08:10:43 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 08:10:43 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 616 previous similar messages Jun 20 08:20:46 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 49152 GRANT, real grant 0 Jun 20 08:20:46 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 703 previous similar messages Jun 20 08:30:56 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 08:30:56 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 681 previous similar messages Jun 20 08:40:59 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 08:40:59 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 58668 previous similar messages Jun 20 08:51:01 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 20 08:51:01 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1290 previous similar messages Jun 20 09:02:49 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 09:02:49 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 521 previous similar messages Jun 20 09:18:28 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 09:18:28 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jun 20 09:31:48 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 09:31:48 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jun 20 09:46:56 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 09:46:56 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jun 20 09:56:57 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 09:56:57 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1835 previous similar messages Jun 20 10:07:00 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 10:07:00 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 20 10:17:08 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 10:17:08 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 735 previous similar messages Jun 20 10:27:10 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 10:27:10 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 787 previous similar messages Jun 20 10:37:18 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 86016 GRANT, real grant 0 Jun 20 10:37:18 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 51449 previous similar messages Jun 20 10:47:19 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 73728 GRANT, real grant 0 Jun 20 10:47:19 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7765 previous similar messages Jun 20 10:57:41 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 102400 GRANT, real grant 0 Jun 20 10:57:41 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jun 20 11:08:11 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 11:08:11 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 226 previous similar messages Jun 20 11:18:18 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 11:18:18 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 236 previous similar messages Jun 20 11:28:22 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 86016 GRANT, real grant 0 Jun 20 11:28:22 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2275 previous similar messages Jun 20 11:38:24 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 98304 GRANT, real grant 0 Jun 20 11:38:24 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 590 previous similar messages Jun 20 11:48:26 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 11:48:26 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 684 previous similar messages Jun 20 11:58:31 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 11:58:31 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 912 previous similar messages Jun 20 12:08:33 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 12:08:33 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 58823 previous similar messages Jun 20 12:08:50 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 12:08:50 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 20 12:18:33 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 12:18:33 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1245 previous similar messages Jun 20 12:28:33 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 12:28:33 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 230 previous similar messages Jun 20 12:38:34 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 12:38:34 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 697 previous similar messages Jun 20 12:48:34 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 12:48:34 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 771 previous similar messages Jun 20 12:58:35 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 20 12:58:35 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 879 previous similar messages Jun 20 13:08:36 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 20 13:08:36 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1448 previous similar messages Jun 20 13:18:38 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 20 13:18:38 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1775 previous similar messages Jun 20 13:28:38 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 13:28:38 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1863 previous similar messages Jun 20 13:58:35 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 13:58:35 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 664 previous similar messages Jun 20 14:00:47 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 14:00:47 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 20 14:27:25 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 14:29:37 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 14:29:37 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 20 14:56:14 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 14:56:19 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 20 14:58:27 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 15:24:30 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 15:24:43 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 15:25:04 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 15:49:49 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 15:49:49 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 20 15:50:19 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 20 15:52:15 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 15:54:07 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 15:54:07 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 20 15:55:45 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:02:11 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 16:04:14 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:05:33 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:06:31 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:14:02 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 16:19:26 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:19:39 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:20:07 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:20:07 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 20 16:23:50 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:25:51 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 16:26:54 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:27:43 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:29:25 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:32:15 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:32:15 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 20 16:37:41 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 16:39:27 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:39:27 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jun 20 16:49:31 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 16:49:31 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 20 16:49:36 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 16:55:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 815d7676-5c34-1cc9-c5dd-bad0fb6e70bb (at 10.8.14.8@o2ib6) Jun 20 16:55:30 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 20 16:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d8cec7bd-0c71-5918-8514-07b7e416bc71 (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518b05c00, cur 1561074934 expire 1561074784 last 1561074707 Jun 20 16:55:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 20 16:55:36 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 815d7676-5c34-1cc9-c5dd-bad0fb6e70bb (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25257d9800, cur 1561074936 expire 1561074786 last 1561074709 Jun 20 17:01:27 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 17:02:07 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:02:07 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 20 17:12:20 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:12:20 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jun 20 17:13:14 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 17:22:21 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:22:21 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 20 17:25:06 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 17:32:52 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:32:52 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jun 20 17:36:31 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 17:43:31 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:43:31 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 184 previous similar messages Jun 20 17:48:22 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 17:53:46 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 17:53:46 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Jun 20 18:00:13 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:04:02 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:04:02 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 85 previous similar messages Jun 20 18:12:12 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:15:32 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:15:32 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 48 previous similar messages Jun 20 18:23:57 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:26:00 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:26:00 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jun 20 18:36:11 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:37:35 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:37:35 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages Jun 20 18:47:33 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:49:22 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:49:22 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 37 previous similar messages Jun 20 18:59:21 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 18:59:39 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 18:59:39 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages Jun 20 19:10:58 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 19:10:58 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 20 19:11:36 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 19:21:30 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 19:21:30 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 33 previous similar messages Jun 20 19:23:25 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 19:35:14 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 19:38:18 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 19:38:18 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 20 19:47:04 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 19:57:52 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 19:57:52 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 20 19:58:52 fir-md1-s1 kernel: LustreError: 21717:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 20:10:41 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 20:16:38 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 20:16:38 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 20 20:22:28 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 20:34:16 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 20:46:08 fir-md1-s1 kernel: LustreError: 27482:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 20:58:00 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 21:09:51 fir-md1-s1 kernel: LustreError: 21717:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 21:21:35 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 21:33:18 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 21:44:26 fir-md1-s1 kernel: LustreError: 21717:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 21:45:46 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 21:45:46 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jun 20 21:55:32 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:04:30 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:06:09 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:16:51 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:19:43 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:20:34 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:21:50 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:26:13 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:27:15 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:27:33 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:31:33 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:31:33 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 20 22:34:01 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:36:01 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:36:01 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 20 22:37:40 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:42:17 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:47:57 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:48:13 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:48:13 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 83 previous similar messages Jun 20 22:58:13 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 22:58:33 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 22:58:33 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 20 23:08:23 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 23:09:44 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 23:09:44 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 184 previous similar messages Jun 20 23:18:34 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 20 23:22:00 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 23:22:00 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Jun 20 23:32:15 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 23:32:15 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 31 previous similar messages Jun 20 23:42:48 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 23:42:48 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages Jun 20 23:53:11 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 20 23:53:11 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 107 previous similar messages Jun 21 00:03:48 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:03:48 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 29 previous similar messages Jun 21 00:13:52 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:13:52 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 67 previous similar messages Jun 21 00:24:09 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:24:09 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 279 previous similar messages Jun 21 00:34:17 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:34:17 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 76 previous similar messages Jun 21 00:44:28 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:44:28 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 237 previous similar messages Jun 21 00:54:33 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 00:54:33 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 604 previous similar messages Jun 21 00:57:09 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 00:57:15 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 00:57:21 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 00:57:23 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 00:57:23 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jun 21 00:57:27 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 00:57:27 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31 previous similar messages Jun 21 00:57:41 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 21 00:57:41 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jun 21 00:57:57 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 21 00:57:57 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 803 previous similar messages Jun 21 00:58:37 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 21 00:58:37 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 21 01:01:46 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 21 01:01:46 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 163 previous similar messages Jun 21 01:04:36 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:04:36 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages Jun 21 01:14:48 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:14:48 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 666 previous similar messages Jun 21 01:16:27 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 01:16:56 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 01:16:56 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 21 01:17:38 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 106496 GRANT, real grant 0 Jun 21 01:17:38 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 35 previous similar messages Jun 21 01:18:48 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 21 01:18:48 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 92 previous similar messages Jun 21 01:20:59 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 01:20:59 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 70 previous similar messages Jun 21 01:24:49 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:24:49 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1375 previous similar messages Jun 21 01:25:25 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 01:25:25 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 74 previous similar messages Jun 21 01:34:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 01:34:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 21 01:34:49 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:34:49 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 384 previous similar messages Jun 21 01:44:06 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 01:44:06 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 503 previous similar messages Jun 21 01:44:52 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:44:52 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 884 previous similar messages Jun 21 01:54:16 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 01:54:16 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 21 01:54:53 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 01:54:53 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1624 previous similar messages Jun 21 02:04:21 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 02:04:21 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 478 previous similar messages Jun 21 02:04:54 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:04:54 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 296 previous similar messages Jun 21 02:14:31 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 02:14:31 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 470 previous similar messages Jun 21 02:15:03 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:15:03 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3707 previous similar messages Jun 21 02:24:35 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 02:24:35 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 21 02:25:08 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:25:08 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2273 previous similar messages Jun 21 02:34:44 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 02:34:44 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1632 previous similar messages Jun 21 02:35:22 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:35:22 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 330 previous similar messages Jun 21 02:44:51 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 02:44:51 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 421 previous similar messages Jun 21 02:45:29 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:45:29 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1489 previous similar messages Jun 21 02:55:31 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 02:55:31 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1643 previous similar messages Jun 21 02:55:55 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 02:55:55 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 49 previous similar messages Jun 21 03:05:32 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:05:32 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 259 previous similar messages Jun 21 03:05:57 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 03:05:57 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jun 21 03:15:37 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:15:37 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1410 previous similar messages Jun 21 03:16:05 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 126976 GRANT, real grant 0 Jun 21 03:16:05 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1263 previous similar messages Jun 21 03:25:37 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:25:37 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2751 previous similar messages Jun 21 03:26:08 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 03:26:08 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 243 previous similar messages Jun 21 03:35:56 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:35:56 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 288 previous similar messages Jun 21 03:36:09 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 03:36:09 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 383 previous similar messages Jun 21 03:45:59 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:45:59 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1828 previous similar messages Jun 21 03:46:10 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 03:46:10 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 716 previous similar messages Jun 21 03:55:59 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 03:55:59 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2206 previous similar messages Jun 21 03:56:10 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 03:56:10 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 591 previous similar messages Jun 21 04:06:00 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:06:00 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 255 previous similar messages Jun 21 04:06:13 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 04:06:13 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1670 previous similar messages Jun 21 04:16:01 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:16:01 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3284 previous similar messages Jun 21 04:16:14 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 04:16:14 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1822 previous similar messages Jun 21 04:26:02 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:26:02 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1023 previous similar messages Jun 21 04:26:15 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 04:26:15 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 530 previous similar messages Jun 21 04:36:03 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:36:03 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 703 previous similar messages Jun 21 04:36:17 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 04:36:17 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 883 previous similar messages Jun 21 04:46:03 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:46:03 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5430 previous similar messages Jun 21 04:46:25 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 04:46:25 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1018 previous similar messages Jun 21 04:56:05 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 04:56:05 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 384 previous similar messages Jun 21 04:56:29 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 04:56:29 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 976 previous similar messages Jun 21 05:06:06 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:06:06 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4767 previous similar messages Jun 21 05:06:32 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 106496 GRANT, real grant 0 Jun 21 05:06:32 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 661 previous similar messages Jun 21 05:16:09 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:16:09 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3850 previous similar messages Jun 21 05:16:49 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 05:16:49 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 898 previous similar messages Jun 21 05:26:09 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:26:09 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 376 previous similar messages Jun 21 05:26:51 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 05:26:51 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 969 previous similar messages Jun 21 05:36:11 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:36:11 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4162 previous similar messages Jun 21 05:36:53 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 05:36:53 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1049 previous similar messages Jun 21 05:46:15 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:46:15 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2518 previous similar messages Jun 21 05:47:01 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 05:47:01 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1028 previous similar messages Jun 21 05:56:16 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 05:56:16 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 459 previous similar messages Jun 21 05:57:15 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 05:57:15 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 732 previous similar messages Jun 21 06:06:16 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:06:16 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3435 previous similar messages Jun 21 06:07:26 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 06:07:26 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1041 previous similar messages Jun 21 06:16:16 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:16:16 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3560 previous similar messages Jun 21 06:17:30 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 06:17:30 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 834 previous similar messages Jun 21 06:26:25 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:26:25 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 410 previous similar messages Jun 21 06:27:34 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 06:27:34 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 551 previous similar messages Jun 21 06:36:28 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:36:28 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3289 previous similar messages Jun 21 06:37:34 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 06:37:34 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 309 previous similar messages Jun 21 06:46:30 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:46:30 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3608 previous similar messages Jun 21 06:47:35 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 06:47:35 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jun 21 06:56:31 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 06:56:31 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 370 previous similar messages Jun 21 06:57:35 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 06:57:35 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 755 previous similar messages Jun 21 07:06:33 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:06:33 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2292 previous similar messages Jun 21 07:07:39 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 07:07:39 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 746 previous similar messages Jun 21 07:16:34 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:16:34 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5203 previous similar messages Jun 21 07:17:39 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 07:17:39 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 675 previous similar messages Jun 21 07:26:36 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:26:36 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 466 previous similar messages Jun 21 07:27:49 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 07:27:49 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 689 previous similar messages Jun 21 07:36:48 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:36:48 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2843 previous similar messages Jun 21 07:37:53 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 07:37:53 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 717 previous similar messages Jun 21 07:46:49 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:46:49 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2442 previous similar messages Jun 21 07:47:59 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 135168 GRANT, real grant 0 Jun 21 07:47:59 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 696 previous similar messages Jun 21 07:56:55 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 07:56:55 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 519 previous similar messages Jun 21 07:57:59 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 81920 GRANT, real grant 0 Jun 21 07:57:59 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 21 08:06:55 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:06:55 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2184 previous similar messages Jun 21 08:08:07 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 08:08:07 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 539 previous similar messages Jun 21 08:16:56 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:16:56 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3160 previous similar messages Jun 21 08:18:11 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 21 08:18:11 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 925 previous similar messages Jun 21 08:27:00 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:27:00 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 350 previous similar messages Jun 21 08:28:25 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 08:28:25 fir-md1-s1 kernel: LustreError: 25998:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 756 previous similar messages Jun 21 08:37:06 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:37:06 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4244 previous similar messages Jun 21 08:38:25 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 08:38:25 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 58533 previous similar messages Jun 21 08:47:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:47:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3126 previous similar messages Jun 21 08:48:28 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 21 08:48:28 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1221 previous similar messages Jun 21 08:57:15 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 08:57:15 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 325 previous similar messages Jun 21 09:02:18 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 09:02:18 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jun 21 09:07:16 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:07:16 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2868 previous similar messages Jun 21 09:17:17 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:17:17 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2968 previous similar messages Jun 21 09:17:25 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 09:17:25 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jun 21 09:27:19 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:27:19 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 618 previous similar messages Jun 21 09:30:54 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 09:30:54 fir-md1-s1 kernel: LustreError: 20504:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jun 21 09:37:20 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:37:20 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2752 previous similar messages Jun 21 09:47:22 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:47:22 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2202 previous similar messages Jun 21 09:57:23 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 09:57:23 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 577 previous similar messages Jun 21 10:05:13 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 21 10:05:13 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jun 21 10:07:27 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:07:27 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3400 previous similar messages Jun 21 10:17:31 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:17:31 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1640 previous similar messages Jun 21 10:27:38 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:27:38 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2409 previous similar messages Jun 21 10:37:40 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:37:40 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2335 previous similar messages Jun 21 10:47:42 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:47:42 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 557 previous similar messages Jun 21 10:57:45 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 10:57:45 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2457 previous similar messages Jun 21 11:06:44 fir-md1-s1 kernel: Lustre: 21073:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e18e9a100 x1635087235946768/t0(0) o36->c50a2569-5f68-c0c4-a8b8-bfb61fe4dbbb@10.9.114.5@o2ib4:19/0 lens 536/2888 e 1 to 0 dl 1561140409 ref 2 fl Interpret:/0/0 rc 0/0 Jun 21 11:06:44 fir-md1-s1 kernel: Lustre: 21073:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jun 21 11:06:46 fir-md1-s1 kernel: Lustre: 21073:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e234bc800 x1634124367904096/t0(0) o36->190e8c90-938d-b7f6-84df-7662b8e78e53@10.9.107.71@o2ib4:21/0 lens 552/2888 e 1 to 0 dl 1561140411 ref 2 fl Interpret:/0/0 rc 0/0 Jun 21 11:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c50a2569-5f68-c0c4-a8b8-bfb61fe4dbbb (at 10.9.114.5@o2ib4) reconnecting Jun 21 11:06:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 21 11:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a99d6390-552e-efef-43b1-60bd87733129 (at 10.9.114.5@o2ib4) Jun 21 11:06:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 21 11:06:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8ba50a96-f3d9-3920-760c-8aedb752cbea (at 10.9.107.71@o2ib4) Jun 21 11:07:41 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jun 21 11:07:41 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (105): c: 7, oc: 0, rc: 8 Jun 21 11:09:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14c7777000, cur 1561140567 expire 1561140417 last 1561140340 Jun 21 11:09:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 21 11:09:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1675e4f5-80cb-6029-9271-7b3f4a7873d6 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2532670000, cur 1561140583 expire 1561140433 last 1561140356 Jun 21 11:10:25 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 11:10:25 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 21 11:10:51 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 11:10:51 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jun 21 11:11:22 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 11:11:22 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 21 11:12:06 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 110592 GRANT, real grant 0 Jun 21 11:12:06 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1585 previous similar messages Jun 21 11:13:27 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 21 11:13:27 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 209 previous similar messages Jun 21 11:13:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 21 11:15:21 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 11:15:21 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 157 previous similar messages Jun 21 11:16:07 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 21 11:16:07 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 21 11:21:08 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 11:21:08 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 21 11:31:17 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 11:31:17 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 21 11:40:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jun 21 11:40:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (108): c: 8, oc: 0, rc: 8 Jun 21 11:41:19 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 11:41:19 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 550 previous similar messages Jun 21 11:47:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 21 11:47:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 21 11:51:07 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 11:51:07 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5477 previous similar messages Jun 21 11:51:35 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 21 11:51:35 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 20361 previous similar messages Jun 21 11:52:22 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 11:52:22 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3381 previous similar messages Jun 21 11:54:53 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 11:54:53 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1585 previous similar messages Jun 21 12:00:15 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:00:15 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 95 previous similar messages Jun 21 12:10:16 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:10:16 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1504 previous similar messages Jun 21 12:20:21 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:20:21 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5067 previous similar messages Jun 21 12:30:24 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:30:24 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 689 previous similar messages Jun 21 12:40:24 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:40:24 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5807 previous similar messages Jun 21 12:50:25 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 12:50:25 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2656 previous similar messages Jun 21 13:00:26 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:00:26 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6064 previous similar messages Jun 21 13:10:28 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:10:28 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4237 previous similar messages Jun 21 13:20:28 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:20:28 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5924 previous similar messages Jun 21 13:30:40 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:30:40 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4528 previous similar messages Jun 21 13:40:42 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:40:42 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5875 previous similar messages Jun 21 13:50:44 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 13:50:44 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 431 previous similar messages Jun 21 14:00:53 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:00:53 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3226 previous similar messages Jun 21 14:10:54 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:10:54 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2729 previous similar messages Jun 21 14:20:56 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:20:56 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2474 previous similar messages Jun 21 14:31:14 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:31:14 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9037 previous similar messages Jun 21 14:41:21 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:41:21 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 309 previous similar messages Jun 21 14:51:29 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 14:51:29 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 461 previous similar messages Jun 21 15:01:32 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:01:32 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6161 previous similar messages Jun 21 15:11:35 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:11:35 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 269 previous similar messages Jun 21 15:21:37 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:21:37 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4361 previous similar messages Jun 21 15:31:42 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:31:42 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5741 previous similar messages Jun 21 15:41:52 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:41:52 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 380 previous similar messages Jun 21 15:52:24 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 15:52:24 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4143 previous similar messages Jun 21 16:02:25 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:02:25 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4240 previous similar messages Jun 21 16:12:27 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:12:27 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1721 previous similar messages Jun 21 16:22:36 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:22:36 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4798 previous similar messages Jun 21 16:32:37 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:32:37 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 370 previous similar messages Jun 21 16:43:00 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:43:00 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5984 previous similar messages Jun 21 16:53:02 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 16:53:02 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 374 previous similar messages Jun 21 17:03:11 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:03:11 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6833 previous similar messages Jun 21 17:13:12 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:13:12 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6018 previous similar messages Jun 21 17:23:13 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:23:13 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 460 previous similar messages Jun 21 17:33:14 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:33:14 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3622 previous similar messages Jun 21 17:43:14 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:43:14 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1956 previous similar messages Jun 21 17:53:29 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 17:53:29 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 827 previous similar messages Jun 21 18:03:39 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:03:39 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1131 previous similar messages Jun 21 18:13:40 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:13:40 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1681 previous similar messages Jun 21 18:20:28 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 18:20:28 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 38827 previous similar messages Jun 21 18:21:46 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 21 18:21:46 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1603 previous similar messages Jun 21 18:24:24 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:24:24 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2154 previous similar messages Jun 21 18:26:00 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 21 18:26:00 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 214 previous similar messages Jun 21 18:31:03 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 21 18:31:03 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 21 18:34:39 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:34:39 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3335 previous similar messages Jun 21 18:41:04 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 18:41:04 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 413 previous similar messages Jun 21 18:44:55 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:44:55 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 255 previous similar messages Jun 21 18:51:06 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 21 18:51:06 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 564 previous similar messages Jun 21 18:54:57 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 18:54:57 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2306 previous similar messages Jun 21 19:01:22 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 94208 GRANT, real grant 0 Jun 21 19:01:22 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15785 previous similar messages Jun 21 19:05:20 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:05:20 fir-md1-s1 kernel: Lustre: 21668:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2455 previous similar messages Jun 21 19:15:25 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:15:25 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 281 previous similar messages Jun 21 19:25:42 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:25:42 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2577 previous similar messages Jun 21 19:36:06 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:36:06 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3780 previous similar messages Jun 21 19:46:10 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:46:10 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 267 previous similar messages Jun 21 19:56:37 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 19:56:37 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1344 previous similar messages Jun 21 20:06:44 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:06:44 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1396 previous similar messages Jun 21 20:16:54 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:16:54 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 299 previous similar messages Jun 21 20:26:57 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:26:57 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 278 previous similar messages Jun 21 20:36:59 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:36:59 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1285 previous similar messages Jun 21 20:47:01 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:47:01 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2576 previous similar messages Jun 21 20:57:07 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 20:57:07 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 624 previous similar messages Jun 21 21:07:10 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:07:10 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3192 previous similar messages Jun 21 21:17:10 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:17:10 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2201 previous similar messages Jun 21 21:27:12 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:27:12 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 887 previous similar messages Jun 21 21:37:14 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:37:14 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2216 previous similar messages Jun 21 21:47:25 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:47:25 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1435 previous similar messages Jun 21 21:57:40 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 21:57:40 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 273 previous similar messages Jun 21 22:08:02 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:08:02 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 625 previous similar messages Jun 21 22:18:08 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:18:08 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1774 previous similar messages Jun 21 22:28:34 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:28:34 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 359 previous similar messages Jun 21 22:38:38 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:38:38 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3257 previous similar messages Jun 21 22:48:38 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:48:38 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2435 previous similar messages Jun 21 22:58:49 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 22:58:49 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 319 previous similar messages Jun 21 23:08:57 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:08:57 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 485 previous similar messages Jun 21 23:11:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2fbdd3a1-1348-387a-9c62-8e4888f673df (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ea3d8800, cur 1561183889 expire 1561183739 last 1561183662 Jun 21 23:11:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 21 23:11:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d80b2c48-58e4-12d3-5b26-0e7343b58644 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f1aed7c00, cur 1561183894 expire 1561183744 last 1561183667 Jun 21 23:11:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 21 23:11:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Jun 21 23:11:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 21 23:18:59 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:18:59 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2494 previous similar messages Jun 21 23:28:59 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:28:59 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4301 previous similar messages Jun 21 23:39:01 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:39:01 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5072 previous similar messages Jun 21 23:49:01 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:49:01 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5445 previous similar messages Jun 21 23:59:02 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 21 23:59:02 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4972 previous similar messages Jun 22 00:09:02 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:09:02 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3593 previous similar messages Jun 22 00:19:02 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:19:02 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3499 previous similar messages Jun 22 00:29:03 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:29:03 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3488 previous similar messages Jun 22 00:39:03 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:39:03 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3496 previous similar messages Jun 22 00:49:04 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:49:04 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4026 previous similar messages Jun 22 00:57:11 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 00:57:11 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43417 previous similar messages Jun 22 00:58:47 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 00:58:47 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1937 previous similar messages Jun 22 00:59:05 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 00:59:05 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4224 previous similar messages Jun 22 01:01:46 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 22 01:01:46 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 125 previous similar messages Jun 22 01:09:06 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:09:06 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 606 previous similar messages Jun 22 01:16:58 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 01:16:58 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 22 01:17:39 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 22 01:17:39 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 22 01:18:54 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 57344 GRANT, real grant 0 Jun 22 01:18:54 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 87 previous similar messages Jun 22 01:19:07 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:19:07 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2365 previous similar messages Jun 22 01:21:26 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 01:21:26 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 273 previous similar messages Jun 22 01:26:30 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 49152 GRANT, real grant 0 Jun 22 01:26:30 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 280 previous similar messages Jun 22 01:29:07 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:29:07 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4021 previous similar messages Jun 22 01:36:33 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 01:36:33 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 632 previous similar messages Jun 22 01:39:07 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:39:07 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4213 previous similar messages Jun 22 01:46:36 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 01:46:36 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 959 previous similar messages Jun 22 01:49:07 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:49:07 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4191 previous similar messages Jun 22 01:56:39 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 01:56:39 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 656 previous similar messages Jun 22 01:59:07 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 01:59:07 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4250 previous similar messages Jun 22 02:06:46 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 02:06:46 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1005 previous similar messages Jun 22 02:09:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:09:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3816 previous similar messages Jun 22 02:16:55 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 02:16:55 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 657 previous similar messages Jun 22 02:19:09 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:19:09 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4170 previous similar messages Jun 22 02:27:02 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 02:27:02 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2118 previous similar messages Jun 22 02:29:10 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:29:10 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3770 previous similar messages Jun 22 02:37:06 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 02:37:06 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 646 previous similar messages Jun 22 02:39:13 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:39:13 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3238 previous similar messages Jun 22 02:47:09 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 02:47:09 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 947 previous similar messages Jun 22 02:49:13 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:49:13 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3264 previous similar messages Jun 22 02:57:11 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 02:57:11 fir-md1-s1 kernel: LustreError: 21716:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 426 previous similar messages Jun 22 02:59:13 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 02:59:13 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3218 previous similar messages Jun 22 03:07:12 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 03:07:12 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jun 22 03:09:14 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:09:14 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3552 previous similar messages Jun 22 03:17:13 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 03:17:13 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1420 previous similar messages Jun 22 03:19:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:19:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3482 previous similar messages Jun 22 03:27:14 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 03:27:14 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 629 previous similar messages Jun 22 03:29:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:29:15 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3227 previous similar messages Jun 22 03:37:15 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 03:37:15 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 726 previous similar messages Jun 22 03:39:15 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:39:15 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2921 previous similar messages Jun 22 03:47:17 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 03:47:17 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 872 previous similar messages Jun 22 03:49:16 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:49:16 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2604 previous similar messages Jun 22 03:57:19 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 03:57:19 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1046 previous similar messages Jun 22 03:59:16 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 03:59:16 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2810 previous similar messages Jun 22 04:07:22 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 04:07:22 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 791 previous similar messages Jun 22 04:09:17 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:09:17 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2460 previous similar messages Jun 22 04:17:23 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 28672 GRANT, real grant 0 Jun 22 04:17:23 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1540 previous similar messages Jun 22 04:19:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:19:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2654 previous similar messages Jun 22 04:25:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 22 04:25:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 04:26:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 768e69f1-686d-dc63-c888-d7b2745331f7 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25211b8400, cur 1561202785 expire 1561202635 last 1561202558 Jun 22 04:26:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 768e69f1-686d-dc63-c888-d7b2745331f7 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdaacc00, cur 1561202803 expire 1561202653 last 1561202576 Jun 22 04:26:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 22 04:27:32 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 04:27:32 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1895 previous similar messages Jun 22 04:29:18 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:29:18 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3076 previous similar messages Jun 22 04:37:33 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 81920 GRANT, real grant 0 Jun 22 04:37:33 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 435 previous similar messages Jun 22 04:39:18 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:39:18 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3391 previous similar messages Jun 22 04:47:37 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 04:47:37 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 626 previous similar messages Jun 22 04:49:19 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:49:19 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3215 previous similar messages Jun 22 04:57:44 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 04:57:44 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1073 previous similar messages Jun 22 04:59:19 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 04:59:19 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3782 previous similar messages Jun 22 05:07:55 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 05:07:55 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1176 previous similar messages Jun 22 05:09:20 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:09:20 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3081 previous similar messages Jun 22 05:18:00 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 05:18:00 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 839 previous similar messages Jun 22 05:19:21 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:19:21 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2657 previous similar messages Jun 22 05:28:00 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 05:28:00 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1085 previous similar messages Jun 22 05:29:22 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:29:22 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3095 previous similar messages Jun 22 05:38:07 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 05:38:07 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1089 previous similar messages Jun 22 05:38:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2c97d373-364e-c157-5583-02820de3bb2e (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148c8cac00, cur 1561207121 expire 1561206971 last 1561206894 Jun 22 05:39:22 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:39:22 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1902 previous similar messages Jun 22 05:48:11 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 22 05:48:11 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 999 previous similar messages Jun 22 05:49:23 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:49:23 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2890 previous similar messages Jun 22 05:58:13 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 05:58:13 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 946 previous similar messages Jun 22 05:59:23 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 05:59:23 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4018 previous similar messages Jun 22 06:08:15 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 06:08:15 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 840 previous similar messages Jun 22 06:09:24 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:09:24 fir-md1-s1 kernel: Lustre: 21417:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5540 previous similar messages Jun 22 06:18:24 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 06:18:24 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1077 previous similar messages Jun 22 06:19:27 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:19:27 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6484 previous similar messages Jun 22 06:28:27 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 06:28:27 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 976 previous similar messages Jun 22 06:29:27 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:29:27 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5334 previous similar messages Jun 22 06:38:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 06:38:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1047 previous similar messages Jun 22 06:39:28 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:39:28 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6116 previous similar messages Jun 22 06:48:30 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 53248 GRANT, real grant 0 Jun 22 06:48:30 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1075 previous similar messages Jun 22 06:49:28 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:49:28 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4963 previous similar messages Jun 22 06:58:31 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 49152 GRANT, real grant 0 Jun 22 06:58:31 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 563 previous similar messages Jun 22 06:59:31 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 06:59:31 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1915 previous similar messages Jun 22 07:08:36 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 07:08:36 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 568 previous similar messages Jun 22 07:09:31 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 07:09:31 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3761 previous similar messages Jun 22 07:18:36 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 07:18:36 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 730 previous similar messages Jun 22 07:19:32 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 07:19:32 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3753 previous similar messages Jun 22 07:28:38 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 07:28:38 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 746 previous similar messages Jun 22 07:29:32 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 07:29:32 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3204 previous similar messages Jun 22 07:38:38 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 07:38:38 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 651 previous similar messages Jun 22 07:48:38 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 07:48:38 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 692 previous similar messages Jun 22 07:58:49 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 22 07:58:49 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 723 previous similar messages Jun 22 08:08:50 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 08:08:50 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 477 previous similar messages Jun 22 08:18:56 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 22 08:18:56 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 22 08:25:14 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 08:25:14 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3181 previous similar messages Jun 22 08:29:09 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 49152 GRANT, real grant 0 Jun 22 08:29:09 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 969 previous similar messages Jun 22 08:33:06 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 08:36:33 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 08:36:33 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 22 08:39:16 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 08:39:16 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 710 previous similar messages Jun 22 08:42:28 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 08:42:28 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jun 22 08:49:17 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 08:49:17 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 58538 previous similar messages Jun 22 08:52:31 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 08:52:31 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 647 previous similar messages Jun 22 08:59:21 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 69632 GRANT, real grant 0 Jun 22 08:59:21 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1355 previous similar messages Jun 22 09:02:32 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:02:32 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2533 previous similar messages Jun 22 09:11:31 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 09:11:31 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 412 previous similar messages Jun 22 09:12:34 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:12:34 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1433 previous similar messages Jun 22 09:22:34 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:22:34 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2123 previous similar messages Jun 22 09:27:26 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 09:27:26 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jun 22 09:32:36 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:32:36 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2710 previous similar messages Jun 22 09:40:48 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 09:40:48 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jun 22 09:42:38 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:42:38 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2753 previous similar messages Jun 22 09:52:44 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 09:52:44 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2889 previous similar messages Jun 22 09:56:02 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 09:56:02 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jun 22 10:02:46 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:02:46 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3156 previous similar messages Jun 22 10:06:37 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:06:37 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 11 previous similar messages Jun 22 10:12:48 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:12:48 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2405 previous similar messages Jun 22 10:16:47 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:16:47 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 224 previous similar messages Jun 22 10:22:54 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:22:54 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1604 previous similar messages Jun 22 10:26:49 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:26:49 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 238 previous similar messages Jun 22 10:33:36 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:33:36 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 292 previous similar messages Jun 22 10:36:55 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:36:55 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 237 previous similar messages Jun 22 10:44:38 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:44:38 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 396 previous similar messages Jun 22 10:46:55 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:46:55 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 257 previous similar messages Jun 22 10:55:30 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 10:55:30 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 267 previous similar messages Jun 22 10:56:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 10:56:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 217 previous similar messages Jun 22 11:05:37 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 11:05:37 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 313 previous similar messages Jun 22 11:06:58 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 11:06:58 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jun 22 11:16:07 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 11:16:07 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 475 previous similar messages Jun 22 11:16:58 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 45056 GRANT, real grant 0 Jun 22 11:16:58 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jun 22 11:26:27 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 11:26:27 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 372 previous similar messages Jun 22 11:26:59 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 11:26:59 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jun 22 11:36:59 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 11:36:59 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 220 previous similar messages Jun 22 11:37:03 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 11:37:03 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 229 previous similar messages Jun 22 11:47:08 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 11:47:08 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 249 previous similar messages Jun 22 11:51:18 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 11:51:18 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 319 previous similar messages Jun 22 11:57:22 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 69632 GRANT, real grant 0 Jun 22 11:57:22 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 220 previous similar messages Jun 22 12:02:26 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:02:26 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 193 previous similar messages Jun 22 12:07:25 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 12:07:25 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 228 previous similar messages Jun 22 12:08:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2ca8c1ab-ca57-7d24-398b-275ee2691945 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4537aa7800, cur 1561230506 expire 1561230356 last 1561230279 Jun 22 12:08:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 12:08:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2ca8c1ab-ca57-7d24-398b-275ee2691945 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520a3c400, cur 1561230508 expire 1561230358 last 1561230281 Jun 22 12:08:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 22 12:17:29 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 12:17:29 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 214 previous similar messages Jun 22 12:27:32 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 12:27:32 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 247 previous similar messages Jun 22 12:32:52 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:32:52 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 31 previous similar messages Jun 22 12:34:17 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:34:17 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages Jun 22 12:37:02 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:37:02 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages Jun 22 12:37:39 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 12:37:39 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 217 previous similar messages Jun 22 12:43:40 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:43:40 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 141 previous similar messages Jun 22 12:47:40 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 12:47:40 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 249 previous similar messages Jun 22 12:54:00 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 12:54:00 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 49 previous similar messages Jun 22 12:57:43 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 22 12:57:43 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 423 previous similar messages Jun 22 13:07:47 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 22 13:07:47 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1294 previous similar messages Jun 22 13:09:41 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 13:09:41 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 31 previous similar messages Jun 22 13:17:50 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 22 13:17:50 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1284 previous similar messages Jun 22 13:27:51 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 13:27:51 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1348 previous similar messages Jun 22 13:28:18 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 13:28:18 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jun 22 13:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e6c09851-8594-e724-7da8-570118535052 (at 10.9.107.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533e57400, cur 1561235462 expire 1561235312 last 1561235235 Jun 22 13:37:51 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 13:37:51 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 611 previous similar messages Jun 22 13:43:06 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 13:47:51 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 13:47:51 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jun 22 13:50:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 286d4aef-dd39-033a-885a-1b2f68dad8ee (at 10.9.112.16@o2ib4) Jun 22 13:50:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 13:50:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f60c199d-7611-7247-14ce-916a8ab83213 (at 10.9.112.13@o2ib4) Jun 22 13:50:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 13:53:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 13:53:09 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 169 previous similar messages Jun 22 13:53:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jun 22 13:53:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 13:57:52 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 13:57:52 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 446 previous similar messages Jun 22 13:59:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c579ffa9-959a-5f2e-006d-9d0dfdb5fa5a (at 10.8.17.26@o2ib6) Jun 22 13:59:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 14:00:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to dd99d9ee-4aca-6a76-941f-529d29521420 (at 10.8.2.28@o2ib6) Jun 22 14:00:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 14:01:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fc2d5c6b-10cd-8ca7-0b9f-2fba82f0b956 (at 10.8.10.27@o2ib6) Jun 22 14:01:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 22 14:04:29 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 14:04:29 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 287 previous similar messages Jun 22 14:07:53 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:07:53 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 425 previous similar messages Jun 22 14:14:29 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 14:14:29 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1720 previous similar messages Jun 22 14:17:53 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:17:53 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 22 14:27:16 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 14:27:16 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 595 previous similar messages Jun 22 14:27:54 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:27:54 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 22 14:37:54 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:37:54 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 22 14:39:20 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 14:39:20 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 127 previous similar messages Jun 22 14:47:55 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:47:55 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 14:51:13 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 14:51:13 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 22 14:57:56 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 14:57:56 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 22 15:01:27 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:01:27 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Jun 22 15:07:57 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:07:57 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 397 previous similar messages Jun 22 15:11:27 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:11:27 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 50 previous similar messages Jun 22 15:17:57 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:17:57 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 15:21:29 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:21:29 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 62 previous similar messages Jun 22 15:27:58 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:27:58 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 405 previous similar messages Jun 22 15:31:30 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:31:30 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 743 previous similar messages Jun 22 15:37:58 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:37:58 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 15:41:31 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:41:31 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1274 previous similar messages Jun 22 15:47:58 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:47:58 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 415 previous similar messages Jun 22 15:51:36 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 15:51:36 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1693 previous similar messages Jun 22 15:57:59 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 15:57:59 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 410 previous similar messages Jun 22 16:03:01 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:03:01 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jun 22 16:08:00 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:08:00 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 419 previous similar messages Jun 22 16:13:10 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:13:10 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages Jun 22 16:18:01 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:18:01 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 22 16:23:11 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:23:11 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 251 previous similar messages Jun 22 16:28:01 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:28:01 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 22 16:33:12 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:33:12 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1059 previous similar messages Jun 22 16:38:02 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:38:02 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 424 previous similar messages Jun 22 16:43:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:43:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2159 previous similar messages Jun 22 16:48:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:48:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 22 16:53:12 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 16:53:12 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14293 previous similar messages Jun 22 16:58:04 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 16:58:04 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 429 previous similar messages Jun 22 17:04:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:04:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16274 previous similar messages Jun 22 17:08:05 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:08:05 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 22 17:14:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:14:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1280 previous similar messages Jun 22 17:18:05 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:18:05 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 17:24:05 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:24:05 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 176 previous similar messages Jun 22 17:28:06 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:28:06 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 22 17:34:22 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:34:22 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 86 previous similar messages Jun 22 17:38:06 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:38:06 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 17:44:26 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:44:26 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 382 previous similar messages Jun 22 17:48:07 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:48:07 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 22 17:54:26 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 17:54:26 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1754 previous similar messages Jun 22 17:58:07 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 17:58:07 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 22 18:08:08 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:08:08 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 408 previous similar messages Jun 22 18:14:04 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 18:14:04 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 22 18:18:09 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:18:09 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 22 18:25:53 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 18:25:53 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 22 18:28:10 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:28:10 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 431 previous similar messages Jun 22 18:38:11 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:38:11 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 405 previous similar messages Jun 22 18:42:48 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 18:42:48 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 22 18:48:12 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:48:12 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 18:54:11 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 18:54:11 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jun 22 18:58:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 18:58:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jun 22 19:08:14 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:08:14 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 521 previous similar messages Jun 22 19:09:59 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 19:09:59 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 22 19:18:16 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:18:16 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jun 22 19:21:16 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 19:21:16 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 22 19:28:16 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:28:16 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 380 previous similar messages Jun 22 19:35:43 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 19:35:43 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 54 previous similar messages Jun 22 19:38:16 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:38:16 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jun 22 19:48:18 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:48:18 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 22 19:55:27 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 19:55:27 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 30 previous similar messages Jun 22 19:58:18 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 19:58:18 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 379 previous similar messages Jun 22 20:08:07 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:08:07 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jun 22 20:08:19 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:08:19 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 22 20:18:19 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:18:19 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 22 20:28:20 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:28:20 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 22 20:29:28 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:29:28 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jun 22 20:33:45 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:33:45 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jun 22 20:38:21 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:38:21 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jun 22 20:42:16 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:48:21 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:48:21 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 381 previous similar messages Jun 22 20:58:22 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 20:58:22 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 380 previous similar messages Jun 22 20:58:27 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:58:27 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jun 22 20:59:34 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 20:59:34 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 170 previous similar messages Jun 22 21:00:49 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:00:49 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3728 previous similar messages Jun 22 21:03:47 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:03:47 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3997 previous similar messages Jun 22 21:08:23 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:08:23 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 390 previous similar messages Jun 22 21:08:48 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:08:48 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28782 previous similar messages Jun 22 21:18:24 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:18:24 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 387 previous similar messages Jun 22 21:19:40 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:19:40 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 782 previous similar messages Jun 22 21:28:24 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:28:24 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 376 previous similar messages Jun 22 21:38:25 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:38:25 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 396 previous similar messages Jun 22 21:38:57 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:38:57 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 312 previous similar messages Jun 22 21:48:26 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:48:26 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 22 21:49:46 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 21:49:46 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jun 22 21:58:27 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 21:58:27 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jun 22 22:00:12 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:00:12 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 20 previous similar messages Jun 22 22:08:27 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:08:27 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 400 previous similar messages Jun 22 22:10:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:10:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jun 22 22:18:29 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:18:29 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 22 22:28:30 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:28:30 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 22 22:38:31 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:38:31 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jun 22 22:39:18 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:39:18 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jun 22 22:44:23 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:44:23 fir-md1-s1 kernel: Lustre: 25681:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jun 22 22:46:53 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:46:53 fir-md1-s1 kernel: Lustre: 21673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 47930 previous similar messages Jun 22 22:48:31 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:48:31 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 418 previous similar messages Jun 22 22:52:00 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 22:52:00 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 56761 previous similar messages Jun 22 22:58:32 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 22:58:32 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 425 previous similar messages Jun 22 23:02:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:02:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11355 previous similar messages Jun 22 23:08:33 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:08:33 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 423 previous similar messages Jun 22 23:12:30 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:12:30 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Jun 22 23:18:34 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:18:34 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 421 previous similar messages Jun 22 23:28:35 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:28:35 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 419 previous similar messages Jun 22 23:37:06 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:37:06 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 22 23:38:35 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:38:35 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 413 previous similar messages Jun 22 23:40:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8fc07523-e22f-bfb9-0ffa-4aa1d872317e (at 10.8.7.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f451891f400, cur 1561272050 expire 1561271900 last 1561271823 Jun 22 23:40:50 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jun 22 23:42:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ea1cdf3e-c1a9-c826-73a8-fd54bacafbe5 (at 10.8.7.4@o2ib6) Jun 22 23:42:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 22 23:45:08 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:45:08 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 22 23:47:56 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:47:56 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 80 previous similar messages Jun 22 23:48:36 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:48:36 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 22 23:53:03 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 22 23:53:03 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 194 previous similar messages Jun 22 23:58:37 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 22 23:58:37 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 414 previous similar messages Jun 23 00:03:34 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:03:34 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 198 previous similar messages Jun 23 00:08:38 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:08:38 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 23 00:14:08 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:14:08 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 377 previous similar messages Jun 23 00:18:39 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:18:39 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 427 previous similar messages Jun 23 00:24:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:24:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 193 previous similar messages Jun 23 00:28:41 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:28:41 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 23 00:35:02 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:35:02 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 292 previous similar messages Jun 23 00:38:42 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:38:42 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 415 previous similar messages Jun 23 00:45:05 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:45:05 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 96 previous similar messages Jun 23 00:48:43 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:48:43 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 23 00:55:14 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 00:55:14 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 599 previous similar messages Jun 23 00:58:44 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 00:58:44 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2150 previous similar messages Jun 23 01:05:50 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 01:05:50 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1487 previous similar messages Jun 23 01:08:45 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 01:08:45 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 802 previous similar messages Jun 23 01:15:52 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 01:15:52 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 77 previous similar messages Jun 23 01:18:46 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 01:18:46 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 23 01:27:43 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 01:27:43 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages Jun 23 01:28:46 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 01:28:46 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1111 previous similar messages Jun 23 01:38:48 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 01:38:48 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1125 previous similar messages Jun 23 01:48:48 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 23 01:48:48 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1396 previous similar messages Jun 23 01:52:53 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 01:52:53 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 171 previous similar messages Jun 23 01:54:27 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 01:54:27 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jun 23 01:58:49 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 01:58:49 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1178 previous similar messages Jun 23 02:08:49 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 02:08:49 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1302 previous similar messages Jun 23 02:10:03 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:18:51 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 02:18:51 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1208 previous similar messages Jun 23 02:19:45 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:20:41 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:20:41 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages Jun 23 02:21:57 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:21:57 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 193 previous similar messages Jun 23 02:24:31 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:24:31 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 138 previous similar messages Jun 23 02:28:51 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 23 02:28:51 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2402 previous similar messages Jun 23 02:36:30 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:36:30 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 131 previous similar messages Jun 23 02:38:52 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 02:38:52 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1179 previous similar messages Jun 23 02:46:44 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 02:46:44 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1140 previous similar messages Jun 23 02:48:53 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 02:48:53 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1267 previous similar messages Jun 23 02:58:54 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 02:58:54 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 775 previous similar messages Jun 23 03:08:55 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 03:08:55 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 816 previous similar messages Jun 23 03:17:08 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 03:17:08 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 838 previous similar messages Jun 23 03:18:55 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 03:18:55 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1679 previous similar messages Jun 23 03:19:17 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 03:19:17 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 37 previous similar messages Jun 23 03:22:05 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 03:22:05 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages Jun 23 03:27:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 03:27:17 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 142 previous similar messages Jun 23 03:28:57 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 03:28:57 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1219 previous similar messages Jun 23 03:38:57 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 03:38:57 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1149 previous similar messages Jun 23 03:48:58 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 23 03:48:58 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1576 previous similar messages Jun 23 03:58:58 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 03:58:58 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1228 previous similar messages Jun 23 04:09:00 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:09:00 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1332 previous similar messages Jun 23 04:19:00 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:19:00 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3231 previous similar messages Jun 23 04:29:01 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:29:01 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 905 previous similar messages Jun 23 04:39:02 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:39:02 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1154 previous similar messages Jun 23 04:49:02 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:49:02 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1147 previous similar messages Jun 23 04:59:03 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 04:59:03 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1436 previous similar messages Jun 23 05:09:03 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 147456 GRANT, real grant 0 Jun 23 05:09:03 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1435 previous similar messages Jun 23 05:19:03 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 61440 GRANT, real grant 0 Jun 23 05:19:03 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1395 previous similar messages Jun 23 05:29:04 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 05:29:04 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1483 previous similar messages Jun 23 05:39:05 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 05:39:05 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1391 previous similar messages Jun 23 05:49:06 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 05:49:06 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1465 previous similar messages Jun 23 05:59:07 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 05:59:07 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1578 previous similar messages Jun 23 06:09:08 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 06:09:08 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1494 previous similar messages Jun 23 06:19:09 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 06:19:09 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1266 previous similar messages Jun 23 06:29:09 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 06:29:09 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1203 previous similar messages Jun 23 06:39:09 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 06:39:09 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1386 previous similar messages Jun 23 06:49:10 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 06:49:10 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1413 previous similar messages Jun 23 06:59:12 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 06:59:12 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1391 previous similar messages Jun 23 07:09:12 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 07:09:12 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1183 previous similar messages Jun 23 07:19:12 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 07:19:12 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1037 previous similar messages Jun 23 07:29:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 07:29:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 931 previous similar messages Jun 23 07:39:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 07:39:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1132 previous similar messages Jun 23 07:49:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 07:49:13 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1108 previous similar messages Jun 23 07:59:14 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 07:59:14 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 999 previous similar messages Jun 23 08:09:15 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 08:09:15 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 932 previous similar messages Jun 23 08:19:15 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 08:19:15 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1250 previous similar messages Jun 23 08:29:16 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 08:29:16 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1334 previous similar messages Jun 23 08:39:16 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 98304 GRANT, real grant 0 Jun 23 08:39:16 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 30331 previous similar messages Jun 23 08:49:16 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 08:49:16 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 30305 previous similar messages Jun 23 08:56:31 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 23 08:56:31 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 100 previous similar messages Jun 23 08:59:17 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 08:59:17 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1472 previous similar messages Jun 23 09:08:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 23 09:08:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 09:09:17 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:09:17 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 400 previous similar messages Jun 23 09:19:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:19:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 418 previous similar messages Jun 23 09:23:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 23 09:23:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 09:29:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:29:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 422 previous similar messages Jun 23 09:39:19 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:39:19 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 434 previous similar messages Jun 23 09:49:19 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:49:19 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 23 09:59:20 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 09:59:20 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 412 previous similar messages Jun 23 10:09:22 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 10:09:22 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 23 10:19:22 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 10:19:22 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 693 previous similar messages Jun 23 10:29:24 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 10:29:24 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 834 previous similar messages Jun 23 10:31:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f1559460-8fda-b79d-be15-a1d7dda11872 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45181f9c00, cur 1561311114 expire 1561310964 last 1561310887 Jun 23 10:31:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 10:33:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Jun 23 10:33:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 10:39:24 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 10:39:24 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 653 previous similar messages Jun 23 10:49:25 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 10:49:25 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 645 previous similar messages Jun 23 10:59:26 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 10:59:26 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 638 previous similar messages Jun 23 11:09:27 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 11:09:27 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 600 previous similar messages Jun 23 11:19:27 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 11:19:27 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 629 previous similar messages Jun 23 11:22:54 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561314167/real 1561314167] req@ffff8f0c20f08300 x1636711856945808/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1561314174 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 23 11:23:02 fir-md1-s1 kernel: Lustre: 21673:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0abb318000 x1634936627028896/t0(0) o101->bd073587-8042-ffd0-09f1-ff79e8722875@10.9.0.63@o2ib4:7/0 lens 480/568 e 1 to 0 dl 1561314187 ref 2 fl Interpret:/0/0 rc 0/0 Jun 23 11:23:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 23 11:23:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 23 11:23:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 23 11:23:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 11:23:15 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561314188/real 1561314188] req@ffff8f0c20f08300 x1636711856945808/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1561314195 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 23 11:23:15 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 23 11:23:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 23 11:23:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 23 11:23:50 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561314223/real 1561314223] req@ffff8f0c20f08300 x1636711856945808/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1561314230 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 23 11:23:50 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 23 11:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 23 11:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 23 11:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 23 11:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 23 11:24:31 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1fda306300 x1631561955722480/t0(0) o101->ebb0ff39-b00e-6e1a-c25b-64754a77a1b9@10.8.0.82@o2ib6:6/0 lens 576/3264 e 1 to 0 dl 1561314276 ref 2 fl Interpret:/0/0 rc 0/0 Jun 23 11:24:32 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d637b7b00 x1631562793996352/t0(0) o101->d594a152-d993-c755-50bf-0f3b806ddc60@10.9.107.22@o2ib4:7/0 lens 576/0 e 1 to 0 dl 1561314277 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 23 11:24:32 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 523 previous similar messages Jun 23 11:24:34 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2085b89800 x1631567984096800/t0(0) o101->442dd3b5-503d-fa23-0886-f83a3c7ec479@10.8.18.5@o2ib6:9/0 lens 576/0 e 1 to 0 dl 1561314279 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 23 11:24:34 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 142 previous similar messages Jun 23 11:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 23 11:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 23 11:24:38 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1fcda53600 x1634929795870016/t0(0) o101->749699ee-a0f2-6ab2-f022-71007184e2c9@10.8.8.23@o2ib6:13/0 lens 576/0 e 1 to 0 dl 1561314283 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 23 11:24:38 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 185 previous similar messages Jun 23 11:24:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b743407c-1f2a-22f5-529c-bf172a166e4e (at 10.8.2.20@o2ib6) Jun 23 11:24:42 fir-md1-s1 kernel: Lustre: Skipped 370 previous similar messages Jun 23 11:24:46 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19acb25100 x1631582681370096/t0(0) o101->aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26@10.8.16.4@o2ib6:21/0 lens 576/0 e 1 to 0 dl 1561314291 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 23 11:24:46 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 224 previous similar messages Jun 23 11:24:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client de259a64-2100-eb0d-e7c9-3532a08afec2 (at 10.9.102.41@o2ib4) reconnecting Jun 23 11:24:51 fir-md1-s1 kernel: Lustre: Skipped 523 previous similar messages Jun 23 11:24:58 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561314291/real 1561314291] req@ffff8f450387ec00 x1636711857283520/t0(0) o104->fir-MDT0000@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1561314298 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 23 11:24:58 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jun 23 11:24:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Jun 23 11:24:58 fir-md1-s1 kernel: Lustre: Skipped 419 previous similar messages Jun 23 11:25:02 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18d08b4200 x1631538541031232/t0(0) o101->c3098872-1c7c-63b2-cf3c-a9a145f04126@10.8.18.31@o2ib6:7/0 lens 576/0 e 1 to 0 dl 1561314307 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 23 11:25:02 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 784 previous similar messages Jun 23 11:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c534882d-6030-1b8a-8c54-b433ef117432 (at 10.9.108.56@o2ib4) reconnecting Jun 23 11:25:23 fir-md1-s1 kernel: Lustre: Skipped 1151 previous similar messages Jun 23 11:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9bc02376-d562-fb9e-7cb7-2dc944d1678e (at 10.9.101.67@o2ib4) Jun 23 11:25:30 fir-md1-s1 kernel: Lustre: Skipped 1127 previous similar messages Jun 23 11:25:34 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1837d3f800 x1635373772455680/t0(0) o101->6d0f4c77-c27b-6d80-d629-873de917b74e@10.8.0.66@o2ib6:9/0 lens 576/0 e 0 to 0 dl 1561314339 ref 2 fl New:/2/ffffffff rc 0/-1 Jun 23 11:25:34 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1266 previous similar messages Jun 23 11:25:46 fir-md1-s1 kernel: LustreError: 21461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561314256, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1b586a5100/0x5d9ee62174100794 lrc: 3/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 1156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21461 timeout: 0 lvb_type: 0 Jun 23 11:25:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561314346.24576 Jun 23 11:25:46 fir-md1-s1 kernel: LustreError: 21461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 194 previous similar messages Jun 23 11:25:46 fir-md1-s1 kernel: LustreError: 23609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561314256, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f06a6201d40/0x5d9ee62174101c78 lrc: 3/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 1156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23609 timeout: 0 lvb_type: 0 Jun 23 11:25:46 fir-md1-s1 kernel: LustreError: 23609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 70 previous similar messages Jun 23 11:25:47 fir-md1-s1 kernel: LustreError: 23679:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561314257, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f119a36b840/0x5d9ee62174102022 lrc: 3/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 1156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23679 timeout: 0 lvb_type: 0 Jun 23 11:25:47 fir-md1-s1 kernel: LustreError: 23679:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 61 previous similar messages Jun 23 11:25:49 fir-md1-s1 kernel: LustreError: 23729:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561314259, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f45180018c0/0x5d9ee621741022d0 lrc: 3/1,0 mode: --/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 1156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23729 timeout: 0 lvb_type: 0 Jun 23 11:25:49 fir-md1-s1 kernel: LustreError: 23729:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 51 previous similar messages Jun 23 11:26:08 fir-md1-s1 kernel: LNet: Service thread pid 20458 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 23 11:26:08 fir-md1-s1 kernel: Pid: 20458, comm: mdt00_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 23 11:26:08 fir-md1-s1 kernel: Call Trace: Jun 23 11:26:08 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jun 23 11:26:08 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jun 23 11:26:08 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jun 23 11:26:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 23 11:26:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 23 11:26:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 23 11:26:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 23 11:26:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 23 11:26:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561314368.20458 Jun 23 11:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a63e3144-5861-13b0-6a48-7b4c39aca713 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0fb0d13000, cur 1561314380 expire 1561314230 last 1561314153 Jun 23 11:26:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 23 11:26:20 fir-md1-s1 kernel: Lustre: 23761:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:98s); client may timeout. req@ffff8f2baeefe600 x1631537339309024/t0(0) o101->f295817f-4700-452c-6407-60dfd6afbd18@10.9.104.4@o2ib4:12/0 lens 576/0 e 1 to 0 dl 1561314282 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jun 23 11:26:20 fir-md1-s1 kernel: LustreError: 25680:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.109.3@o2ib4: deadline 30:1s ago req@ffff8f3519ff4850 x1631615048905472/t0(0) o101->09300796-1183-3575-4e70-90c873be0aeb@10.9.109.3@o2ib4:19/0 lens 576/0 e 0 to 0 dl 1561314379 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jun 23 11:26:20 fir-md1-s1 kernel: LNet: Service thread pid 20458 completed after 212.35s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 23 11:26:20 fir-md1-s1 kernel: Lustre: 23761:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4721 previous similar messages Jun 23 11:26:20 fir-md1-s1 kernel: LustreError: 22280:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cf4e95db6620 vs. last_xid 5cf4e95db677f req@ffff8f178441da00 x1635311312135712/t0(0) o101->c33dfd3e-93e2-b1e4-c92b-6be01740e2e1@10.9.115.7@o2ib4:20/0 lens 576/0 e 0 to 0 dl 1561314410 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jun 23 11:26:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 397e53ea-489f-22f1-95c4-27ab82ab5709 (at 10.9.102.43@o2ib4) reconnecting Jun 23 11:26:28 fir-md1-s1 kernel: Lustre: Skipped 1985 previous similar messages Jun 23 11:26:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.108.22@o2ib4) Jun 23 11:26:35 fir-md1-s1 kernel: Lustre: Skipped 1763 previous similar messages Jun 23 11:28:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Jun 23 11:28:58 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jun 23 11:29:28 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 69632 GRANT, real grant 0 Jun 23 11:29:28 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 481 previous similar messages Jun 23 11:39:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 11:39:28 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 651 previous similar messages Jun 23 11:49:28 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 11:49:28 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 615 previous similar messages Jun 23 11:59:29 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 11:59:29 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 636 previous similar messages Jun 23 12:09:30 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 12:09:30 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 603 previous similar messages Jun 23 12:19:31 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 12:19:31 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 615 previous similar messages Jun 23 12:29:31 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 98304 GRANT, real grant 0 Jun 23 12:29:31 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 630 previous similar messages Jun 23 12:39:32 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 90112 GRANT, real grant 0 Jun 23 12:39:32 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 614 previous similar messages Jun 23 12:49:33 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 12:49:33 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 628 previous similar messages Jun 23 12:59:34 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 12:59:34 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 618 previous similar messages Jun 23 13:09:34 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:09:34 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1331 previous similar messages Jun 23 13:19:35 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:19:35 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1627 previous similar messages Jun 23 13:29:36 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:29:36 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1629 previous similar messages Jun 23 13:39:37 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:39:37 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1386 previous similar messages Jun 23 13:49:37 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:49:37 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jun 23 13:59:38 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 13:59:38 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 398 previous similar messages Jun 23 14:09:38 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:09:38 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 386 previous similar messages Jun 23 14:19:39 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:19:39 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 23 14:29:40 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:29:40 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 23 14:39:41 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:39:41 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 383 previous similar messages Jun 23 14:49:41 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:49:41 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 23 14:59:41 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 14:59:41 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 417 previous similar messages Jun 23 15:09:42 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:09:42 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 423 previous similar messages Jun 23 15:19:43 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:19:43 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 417 previous similar messages Jun 23 15:29:44 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:29:44 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1694 previous similar messages Jun 23 15:39:44 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:39:44 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 23 15:49:44 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:49:44 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 610 previous similar messages Jun 23 15:59:46 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 15:59:46 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 899 previous similar messages Jun 23 16:09:46 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 23 16:09:46 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 815 previous similar messages Jun 23 16:19:47 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 16:19:47 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 898 previous similar messages Jun 23 16:29:47 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 16:29:47 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 725 previous similar messages Jun 23 16:39:47 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 16:39:47 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1026 previous similar messages Jun 23 16:49:48 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 16:49:48 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1028 previous similar messages Jun 23 16:59:50 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 16:59:50 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 618 previous similar messages Jun 23 17:09:50 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:09:50 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 381 previous similar messages Jun 23 17:19:51 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:19:51 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jun 23 17:29:52 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:29:52 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 379 previous similar messages Jun 23 17:39:53 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:39:53 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 386 previous similar messages Jun 23 17:49:53 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:49:53 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jun 23 17:59:55 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 17:59:55 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 378 previous similar messages Jun 23 18:09:55 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:09:55 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 381 previous similar messages Jun 23 18:19:56 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:19:56 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 393 previous similar messages Jun 23 18:29:57 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:29:57 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 396 previous similar messages Jun 23 18:39:57 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:39:57 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 23 18:49:57 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:49:57 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 367 previous similar messages Jun 23 18:59:58 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 18:59:58 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 405 previous similar messages Jun 23 19:09:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 19:09:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 23 19:19:59 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 19:19:59 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 413 previous similar messages Jun 23 19:30:00 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 19:30:00 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 410 previous similar messages Jun 23 19:40:00 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 19:40:00 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 415 previous similar messages Jun 23 19:50:01 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 19:50:01 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 23 20:00:02 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:00:02 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 412 previous similar messages Jun 23 20:10:02 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:10:02 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 415 previous similar messages Jun 23 20:20:03 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:20:03 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 415 previous similar messages Jun 23 20:30:04 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:30:04 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 414 previous similar messages Jun 23 20:40:05 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:40:05 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 417 previous similar messages Jun 23 20:50:05 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 20:50:05 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 23 21:00:09 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:00:09 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 420 previous similar messages Jun 23 21:10:10 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:10:10 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 418 previous similar messages Jun 23 21:20:11 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:20:11 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 398 previous similar messages Jun 23 21:30:11 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:30:11 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 431 previous similar messages Jun 23 21:40:13 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:40:13 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 23 21:50:14 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 21:50:14 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 413 previous similar messages Jun 23 22:00:14 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:00:14 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 410 previous similar messages Jun 23 22:10:15 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:10:15 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 23 22:20:17 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:20:17 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 444 previous similar messages Jun 23 22:30:18 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:30:18 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 23 22:40:19 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:40:19 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 428 previous similar messages Jun 23 22:50:19 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 22:50:19 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 421 previous similar messages Jun 23 23:00:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:00:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 444 previous similar messages Jun 23 23:10:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:10:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 434 previous similar messages Jun 23 23:20:20 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:20:20 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 427 previous similar messages Jun 23 23:30:20 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:30:20 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jun 23 23:40:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:40:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 442 previous similar messages Jun 23 23:50:21 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 23 23:50:21 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 444 previous similar messages Jun 24 00:00:22 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 00:00:22 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 526 previous similar messages Jun 24 00:10:22 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 77824 GRANT, real grant 0 Jun 24 00:10:22 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1469 previous similar messages Jun 24 00:20:23 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 00:20:23 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1294 previous similar messages Jun 24 00:30:24 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 00:30:24 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jun 24 00:40:24 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 00:40:24 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 423 previous similar messages Jun 24 00:50:25 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 00:50:25 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 875 previous similar messages Jun 24 00:56:28 fir-md1-s1 kernel: Lustre: 23632:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jun 24 01:00:26 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 98304 GRANT, real grant 0 Jun 24 01:00:26 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3524 previous similar messages Jun 24 01:10:27 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 01:10:27 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1461 previous similar messages Jun 24 01:20:27 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 01:20:27 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1715 previous similar messages Jun 24 01:30:27 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 01:30:27 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2281 previous similar messages Jun 24 01:35:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33dff121-95b2-ba7a-9b08-f634d4e72016 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c81bd0000, cur 1561365341 expire 1561365191 last 1561365114 Jun 24 01:35:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 01:40:28 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 01:40:28 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2180 previous similar messages Jun 24 01:50:28 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 131072 GRANT, real grant 0 Jun 24 01:50:28 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2166 previous similar messages Jun 24 02:00:28 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 24 02:00:28 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2162 previous similar messages Jun 24 02:10:29 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 02:10:29 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2193 previous similar messages Jun 24 02:20:30 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 02:20:30 fir-md1-s1 kernel: LustreError: 21365:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1878 previous similar messages Jun 24 02:30:30 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 02:30:30 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3293 previous similar messages Jun 24 02:40:31 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 02:40:31 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2339 previous similar messages Jun 24 02:50:31 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 02:50:31 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2176 previous similar messages Jun 24 03:00:31 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 40960 GRANT, real grant 0 Jun 24 03:00:31 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1869 previous similar messages Jun 24 03:10:32 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 28672 GRANT, real grant 0 Jun 24 03:10:32 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1721 previous similar messages Jun 24 03:20:32 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 24 03:20:32 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2826 previous similar messages Jun 24 03:30:33 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 03:30:33 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1996 previous similar messages Jun 24 03:40:34 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 03:40:34 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2215 previous similar messages Jun 24 03:50:35 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 03:50:35 fir-md1-s1 kernel: LustreError: 27586:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2404 previous similar messages Jun 24 04:00:35 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 04:00:35 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2034 previous similar messages Jun 24 04:10:35 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 04:10:35 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1942 previous similar messages Jun 24 04:20:36 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 04:20:36 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2810 previous similar messages Jun 24 04:30:36 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 04:30:36 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2818 previous similar messages Jun 24 04:40:36 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 28672 GRANT, real grant 0 Jun 24 04:40:36 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1656 previous similar messages Jun 24 04:50:37 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 04:50:37 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2093 previous similar messages Jun 24 05:00:37 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 05:00:37 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2219 previous similar messages Jun 24 05:10:38 fir-md1-s1 kernel: LustreError: 27482:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 05:10:38 fir-md1-s1 kernel: LustreError: 27482:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2265 previous similar messages Jun 24 05:20:38 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 131072 GRANT, real grant 0 Jun 24 05:20:38 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1955 previous similar messages Jun 24 05:30:38 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 05:30:38 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2149 previous similar messages Jun 24 05:40:39 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 05:40:39 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2263 previous similar messages Jun 24 05:41:29 fir-md1-s1 kernel: Lustre: 23571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jun 24 05:41:29 fir-md1-s1 kernel: Lustre: 23571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 450 previous similar messages Jun 24 05:50:40 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 24 05:50:40 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2249 previous similar messages Jun 24 06:00:40 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 06:00:40 fir-md1-s1 kernel: LustreError: 27587:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2070 previous similar messages Jun 24 06:10:40 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 06:10:40 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1812 previous similar messages Jun 24 06:20:41 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 32768 GRANT, real grant 0 Jun 24 06:20:41 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1985 previous similar messages Jun 24 06:30:41 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 24 06:30:41 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2000 previous similar messages Jun 24 06:40:43 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli e18301fc-f860-0db4-bf24-6c606e0cc839 claims 155648 GRANT, real grant 0 Jun 24 06:40:43 fir-md1-s1 kernel: LustreError: 27582:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1698 previous similar messages Jun 24 06:50:43 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 06:50:43 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1690 previous similar messages Jun 24 07:00:43 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:00:43 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1704 previous similar messages Jun 24 07:10:45 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:10:45 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1877 previous similar messages Jun 24 07:20:45 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:20:45 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1884 previous similar messages Jun 24 07:30:45 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:30:45 fir-md1-s1 kernel: LustreError: 21453:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1910 previous similar messages Jun 24 07:40:46 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:40:46 fir-md1-s1 kernel: LustreError: 27604:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1957 previous similar messages Jun 24 07:50:47 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 07:50:47 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1806 previous similar messages Jun 24 08:00:48 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:00:48 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1704 previous similar messages Jun 24 08:10:48 fir-md1-s1 kernel: LustreError: 44044:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:10:48 fir-md1-s1 kernel: LustreError: 44044:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1826 previous similar messages Jun 24 08:20:48 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:20:48 fir-md1-s1 kernel: LustreError: 27584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1857 previous similar messages Jun 24 08:30:49 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:30:49 fir-md1-s1 kernel: LustreError: 27581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1991 previous similar messages Jun 24 08:40:49 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:40:49 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1771 previous similar messages Jun 24 08:50:50 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 08:50:50 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31102 previous similar messages Jun 24 08:58:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0d7a1f08-916e-8a37-613f-9b8d0fd14474 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3505f17c00, cur 1561391910 expire 1561391760 last 1561391683 Jun 24 08:58:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 08:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2dbcbb3b-0ac9-659c-3a0d-f7bf6c0943e2 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cd69e6400, cur 1561391918 expire 1561391768 last 1561391691 Jun 24 08:58:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 08:58:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Jun 24 08:58:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 09:00:51 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:00:51 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31049 previous similar messages Jun 24 09:10:52 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:10:52 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2355 previous similar messages Jun 24 09:20:52 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:20:52 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1369 previous similar messages Jun 24 09:30:53 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:30:53 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1380 previous similar messages Jun 24 09:40:54 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:40:54 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1377 previous similar messages Jun 24 09:50:55 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 09:50:55 fir-md1-s1 kernel: LustreError: 21543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 964 previous similar messages Jun 24 10:00:56 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 10:00:56 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 924 previous similar messages Jun 24 10:10:57 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 10:10:57 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1015 previous similar messages Jun 24 10:20:58 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli be42b497-ab1b-8d58-3101-014aad577cfc claims 155648 GRANT, real grant 0 Jun 24 10:20:58 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1270 previous similar messages Jun 24 10:25:01 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f207fc0b600 x1635619423618592/t0(0) o101->d072205a-1b1b-636c-7696-e9d92af1edee@10.8.20.3@o2ib6:6/0 lens 480/568 e 1 to 0 dl 1561397106 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 10:25:01 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2876 previous similar messages Jun 24 10:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 806a1caf-1a24-de27-ca27-ac4ae7fd55bf (at 10.8.23.1@o2ib6) reconnecting Jun 24 10:25:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jun 24 10:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3ccf2b17-86d6-784b-9db3-f8aabdd282e7 (at 10.8.23.1@o2ib6) Jun 24 10:25:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 10:25:09 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a55f32d00 x1631544122849664/t0(0) o101->cec4ce3d-7421-61e4-362c-c29b7d79240a@10.8.27.10@o2ib6:14/0 lens 1768/0 e 1 to 0 dl 1561397114 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 24 10:25:09 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 354 previous similar messages Jun 24 10:25:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6d3df076-afbd-3346-95f4-6badbc5617da (at 10.9.105.32@o2ib4) Jun 24 10:25:11 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jun 24 10:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 98ff8e84-1e9a-d223-7706-0c3e5612efc7 (at 10.8.0.82@o2ib6) Jun 24 10:25:19 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jun 24 10:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 10:25:23 fir-md1-s1 kernel: Lustre: Skipped 140 previous similar messages Jun 24 10:25:25 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f202ec24500 x1634072229373280/t0(0) o101->c6e3bcd8-71de-d683-20ac-e6684b91d659@10.9.108.10@o2ib4:0/0 lens 576/0 e 1 to 0 dl 1561397130 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 24 10:25:25 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 300 previous similar messages Jun 24 10:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 5ecd3339-79cd-a67e-2a5c-bb3ff2529a3c (at 10.8.27.10@o2ib6) Jun 24 10:25:36 fir-md1-s1 kernel: Lustre: Skipped 164 previous similar messages Jun 24 10:25:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4dc6ad45-c67c-15d0-5638-611b0defe5f9 (at 10.8.16.2@o2ib6) reconnecting Jun 24 10:25:55 fir-md1-s1 kernel: Lustre: Skipped 341 previous similar messages Jun 24 10:25:57 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f16af016300 x1634121072321440/t0(0) o101->c1420e99-ffe3-a133-75d0-8971e96a81cc@10.9.106.36@o2ib4:2/0 lens 1768/0 e 1 to 0 dl 1561397162 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 24 10:25:57 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 557 previous similar messages Jun 24 10:26:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9c31a52c-496a-b48d-6003-e6fdea2226d9 (at 10.9.104.22@o2ib4) Jun 24 10:26:08 fir-md1-s1 kernel: Lustre: Skipped 289 previous similar messages Jun 24 10:26:16 fir-md1-s1 kernel: LustreError: 97654:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561397086, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0525ad6540/0x5d9ee622c3bc50ab lrc: 3/0,1 mode: --/PW res: [0x200029bbb:0xd:0x0].0x0 bits 0x40/0x0 rrc: 257 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97654 timeout: 0 lvb_type: 0 Jun 24 10:26:16 fir-md1-s1 kernel: LustreError: 97654:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 132 previous similar messages Jun 24 10:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6) reconnecting Jun 24 10:26:59 fir-md1-s1 kernel: Lustre: Skipped 724 previous similar messages Jun 24 10:27:01 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2071928f00 x1635085484194160/t0(0) o101->a2c269ef-57a9-8b99-0a4b-44a7d221d7bd@10.9.109.36@o2ib4:6/0 lens 1768/0 e 1 to 0 dl 1561397226 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 24 10:27:01 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1474 previous similar messages Jun 24 10:27:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1733e647-dff2-c8f6-7390-5c06c673deac (at 10.9.109.31@o2ib4) Jun 24 10:27:12 fir-md1-s1 kernel: Lustre: Skipped 751 previous similar messages Jun 24 10:27:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.10.20@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f3fec641680/0x5d9ee622c3bc509d lrc: 3/0,0 mode: PW/PW res: [0x200029bbb:0xd:0x0].0x0 bits 0x40/0x0 rrc: 257 type: IBT flags: 0x60200400000020 nid: 10.8.10.20@o2ib6 remote: 0xc48f9d87344ea8bd expref: 85 pid: 24577 timeout: 512295 lvb_type: 0 Jun 24 10:27:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 24 10:27:15 fir-md1-s1 kernel: Lustre: 97654:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:129s); client may timeout. req@ffff8f1d5bba9200 x1633783014311664/t0(0) o101->19313a8c-b11b-17b1-39e1-85aeb6c20cba@10.8.15.9@o2ib6:6/0 lens 1768/0 e 1 to 0 dl 1561397106 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jun 24 10:27:16 fir-md1-s1 kernel: LustreError: 50448:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f24ee52e000 ns: mdt-fir-MDT0000_UUID lock: ffff8f23ebb60fc0/0x5d9ee622c3bc50dc lrc: 3/0,0 mode: PW/PW res: [0x200029bbb:0xd:0x0].0x0 bits 0x40/0x0 rrc: 250 type: IBT flags: 0x50200400000020 nid: 10.8.10.20@o2ib6 remote: 0xc48f9d87344ea8c4 expref: 17 pid: 50448 timeout: 0 lvb_type: 0 Jun 24 10:27:16 fir-md1-s1 kernel: LustreError: 24577:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.16.5@o2ib6: deadline 20:1s ago req@ffff8f17e2647500 x1634924007495888/t0(0) o101->1fb1c1bc-a5c2-7639-1248-10341b490c82@10.8.16.5@o2ib6:14/0 lens 1768/0 e 0 to 0 dl 1561397234 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jun 24 10:27:16 fir-md1-s1 kernel: LustreError: 24577:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 48 previous similar messages Jun 24 10:27:16 fir-md1-s1 kernel: Lustre: 97654:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2747 previous similar messages Jun 24 10:27:19 fir-md1-s1 kernel: LustreError: 97642:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f24ee52e000 ns: mdt-fir-MDT0000_UUID lock: ffff8f24f0ed33c0/0x5d9ee622c3bc65b2 lrc: 3/0,0 mode: PW/PW res: [0x200029bbb:0xd:0x0].0x0 bits 0x40/0x0 rrc: 174 type: IBT flags: 0x50200400000020 nid: 10.8.10.20@o2ib6 remote: 0xc48f9d87344ea949 expref: 10 pid: 97642 timeout: 0 lvb_type: 0 Jun 24 10:27:19 fir-md1-s1 kernel: LustreError: 97642:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Jun 24 10:27:34 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (167:1s); client may timeout. req@ffff8f18887a8f00 x1631646255656320/t0(0) o101->f03aa5e8-f764-2262-c217-2e99830bfe5f@10.8.22.34@o2ib6:6/0 lens 480/536 e 1 to 0 dl 1561397253 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 10:27:34 fir-md1-s1 kernel: LustreError: 20724:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f24ee52e000 ns: mdt-fir-MDT0000_UUID lock: ffff8f16b4cb2d00/0x5d9ee622c3bc6bcb lrc: 3/0,0 mode: PW/PW res: [0x200029bbb:0xd:0x0].0x0 bits 0x40/0x0 rrc: 147 type: IBT flags: 0x50200400000020 nid: 10.8.10.20@o2ib6 remote: 0xc48f9d87344ea981 expref: 8 pid: 20724 timeout: 0 lvb_type: 0 Jun 24 10:27:34 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 16 previous similar messages Jun 24 12:34:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jun 24 12:34:54 fir-md1-s1 kernel: Lustre: Skipped 147 previous similar messages Jun 24 12:35:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cfd4e192-da61-c95f-6005-fc026e176bd8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520744400, cur 1561404903 expire 1561404753 last 1561404676 Jun 24 13:05:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jun 24 13:05:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 13:05:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 460d4bd1-5320-0f4d-604d-3fee0115b165 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a0fef1000, cur 1561406728 expire 1561406578 last 1561406501 Jun 24 13:05:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 14:22:51 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561411364/real 1561411364] req@ffff8f07d12de000 x1636713186879872/t0(0) o104->fir-MDT0002@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561411371 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 14:22:51 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages Jun 24 14:22:59 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ee9503900 x1636449140881520/t0(0) o36->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:4/0 lens 520/448 e 1 to 0 dl 1561411384 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 14:22:59 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 413 previous similar messages Jun 24 14:23:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 59f098aa-fb21-8ed8-84bd-d0ce06cad654 (at 10.9.102.46@o2ib4) reconnecting Jun 24 14:23:05 fir-md1-s1 kernel: Lustre: Skipped 269 previous similar messages Jun 24 14:23:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 460b4624-f225-0fc6-9d6f-aee495221c30 (at 10.9.102.46@o2ib4) Jun 24 14:23:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 14:23:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 826fbeb7-54e9-5127-860e-c32891bc78a7 (at 10.9.107.9@o2ib4) Jun 24 14:23:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 14:23:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d90e9165-328c-67de-acd1-290e1860ac02 (at 10.8.16.7@o2ib6) Jun 24 14:23:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8ab6533c-237c-52d9-a0d0-b7b0b3591cd2 (at 10.9.108.34@o2ib4) Jun 24 14:23:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 14:23:12 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561411385/real 1561411385] req@ffff8f07d12de000 x1636713186879872/t0(0) o104->fir-MDT0002@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561411392 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 24 14:23:12 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 24 14:23:16 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f215f817200 x1631561721028144/t0(0) o101->b4e75cd9-74c7-0ec8-2651-b87e466f256d@10.9.105.70@o2ib4:21/0 lens 576/3264 e 1 to 0 dl 1561411401 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 14:23:16 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jun 24 14:23:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dce245ee-1721-1fa3-f0f5-8ef6b7994bca (at 10.9.105.27@o2ib4) Jun 24 14:23:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 24 14:23:19 fir-md1-s1 kernel: LustreError: 10506:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.9@o2ib6) failed to reply to blocking AST (req@ffff8f07d12de000 x1636713186879872 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2183e38d80/0x5d9ee62316112cd2 lrc: 4/0,0 mode: PR/PR res: [0x2c0024163:0x19838:0x0].0x0 bits 0x13/0x0 rrc: 342 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x808140e2f7ea8097 expref: 650 pid: 22007 timeout: 526481 lvb_type: 0 Jun 24 14:23:19 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.9.9@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 24 14:23:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jun 24 14:23:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2183e38d80/0x5d9ee62316112cd2 lrc: 3/0,0 mode: PR/PR res: [0x2c0024163:0x19838:0x0].0x0 bits 0x13/0x0 rrc: 342 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x808140e2f7ea8097 expref: 651 pid: 22007 timeout: 0 lvb_type: 0 Jun 24 14:23:19 fir-md1-s1 kernel: Lustre: 23644:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0c52e23f00 x1631537617276800/t0(0) o101->7384665e-bddc-c186-a2f8-10bf76931a32@10.9.106.44@o2ib4:18/0 lens 576/536 e 1 to 0 dl 1561411398 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 14:23:19 fir-md1-s1 kernel: Lustre: 23644:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Jun 24 14:24:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 24 14:24:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 14:24:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 804bb2d0-a656-6c01-b0db-5b53058fb0f9 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efef9400, cur 1561411469 expire 1561411319 last 1561411242 Jun 24 14:24:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 14:24:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f253b992000, cur 1561411483 expire 1561411333 last 1561411256 Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 15:00:06 fir-md1-s1 kernel: Lustre: Skipped 233 previous similar messages Jun 24 15:00:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 24 15:00:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 24 15:00:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:00:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:00:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:00:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:00:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:00:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:01:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 24 15:01:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:01:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:01:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:01:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:01:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:01:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 24 15:02:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 24 15:02:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 24 15:15:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) reconnecting Jun 24 15:15:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 24 15:15:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 24 15:15:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 15:15:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 24 15:15:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) reconnecting Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 15:15:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 24 15:17:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 24 15:17:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 24 15:17:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 24 15:23:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 24 15:23:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:23:15 fir-md1-s1 kernel: LustreError: 25997:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2521cd0c50 x1634306077191856/t0(0) o4->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:9/0 lens 488/448 e 0 to 0 dl 1561415019 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6), client will retry: rc = -110 Jun 24 15:23:15 fir-md1-s1 kernel: LustreError: 25997:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jun 24 15:23:44 fir-md1-s1 kernel: Lustre: 23455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f214b22aa00 x1636441686336256/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:19/0 lens 376/1600 e 1 to 0 dl 1561415029 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:23:44 fir-md1-s1 kernel: Lustre: 23455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jun 24 15:42:31 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:42:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 13458280-a046-3a7f-2bec-0301aba013a1 (at 10.8.28.12@o2ib6) reconnecting Jun 24 15:42:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:42:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d0d1dcda-abd5-29f1-1250-5971b6db7d8a (at 10.8.28.12@o2ib6) Jun 24 15:42:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:46:17 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:17 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jun 24 15:46:22 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0 (at 10.8.29.6@o2ib6) reconnecting Jun 24 15:46:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0af4f40a-317e-88ce-7d9c-c4839b78e5a4 (at 10.8.29.6@o2ib6) Jun 24 15:46:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:46:24 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1ef3ac9450 x1636443218314096/t0(0) o3->7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0@10.8.29.6@o2ib6:23/0 lens 488/440 e 0 to 0 dl 1561416413 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:46:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0 (at 10.8.29.6@o2ib6), client will retry: rc -110 Jun 24 15:46:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jun 24 15:46:32 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:32 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jun 24 15:46:42 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:42 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Jun 24 15:46:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9081d826-2f83-5b46-ff73-7e6473184838 (at 10.8.17.25@o2ib6) reconnecting Jun 24 15:46:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 420c129b-df9e-b1c5-eae5-667fed64bb9d (at 10.8.15.3@o2ib6) Jun 24 15:46:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 24 15:46:58 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:46:58 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 24 previous similar messages Jun 24 15:46:59 fir-md1-s1 kernel: LustreError: 46578:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1974bae450 x1631566216131696/t0(0) o4->be42b497-ab1b-8d58-3101-014aad577cfc@10.8.27.35@o2ib6:26/0 lens 488/448 e 0 to 0 dl 1561416446 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:47:01 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cd303a000 Jun 24 15:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with be42b497-ab1b-8d58-3101-014aad577cfc (at 10.8.27.35@o2ib6), client will retry: rc = -110 Jun 24 15:47:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:47:03 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f24e889dc50 x1631557242040512/t0(0) o4->84fd8c4b-6545-cd41-282d-ef5f651cba30@10.8.17.11@o2ib6:29/0 lens 488/448 e 0 to 0 dl 1561416449 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:47:03 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jun 24 15:47:04 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f13b622f400 Jun 24 15:47:04 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22bf91e000 Jun 24 15:47:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6), client will retry: rc = -110 Jun 24 15:47:13 fir-md1-s1 kernel: LustreError: 22730:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1f9946ec50 x1631538709023600/t0(0) o4->ca15d879-1cb2-8780-e5e2-20230d9e27cf@10.8.28.3@o2ib6:10/0 lens 488/448 e 0 to 0 dl 1561416460 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8e6b7782-0f04-da33-0138-eab1c9e41ffb (at 10.8.18.25@o2ib6) reconnecting Jun 24 15:47:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jun 24 15:47:16 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b0b10c800 Jun 24 15:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ca15d879-1cb2-8780-e5e2-20230d9e27cf (at 10.8.28.3@o2ib6), client will retry: rc = -110 Jun 24 15:47:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:47:17 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e3f2bf600 Jun 24 15:47:25 fir-md1-s1 kernel: Lustre: 21433:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416438/real 0] req@ffff8f18dbffad00 x1636713474198768/t0(0) o104->fir-MDT0002@10.8.8.17@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561416445 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:47:25 fir-md1-s1 kernel: Lustre: 21433:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 24 15:47:31 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:47:31 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 41 previous similar messages Jun 24 15:47:33 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416446/real 0] req@ffff8f1a75e0bf00 x1636713474219552/t0(0) o104->fir-MDT0000@10.8.29.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561416453 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:47:33 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jun 24 15:47:43 fir-md1-s1 kernel: LustreError: 46578:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2521cd4450 x1636443218401408/t0(0) o3->7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0@10.8.29.6@o2ib6:12/0 lens 488/440 e 0 to 0 dl 1561416492 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:47:43 fir-md1-s1 kernel: LustreError: 46578:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jun 24 15:47:47 fir-md1-s1 kernel: Lustre: 21460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416460/real 0] req@ffff8f251dbab300 x1636713474283296/t0(0) o104->fir-MDT0000@10.8.29.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561416467 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:47:48 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1696a6a600 Jun 24 15:47:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f192fb0f400 Jun 24 15:47:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 13458280-a046-3a7f-2bec-0301aba013a1 (at 10.8.28.12@o2ib6), client will retry: rc = -110 Jun 24 15:47:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:47:52 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d7b9b0800 Jun 24 15:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0 (at 10.8.29.6@o2ib6), client will retry: rc -110 Jun 24 15:47:54 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22443e7c00 Jun 24 15:47:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6), client will retry: rc = -110 Jun 24 15:47:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:48:00 fir-md1-s1 kernel: Lustre: 23660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19eaf2f500 x1636449131623072/t0(0) o36->9d52b61d-61c3-c5c4-3713-7cb415666394@10.9.102.34@o2ib4:5/0 lens 520/448 e 1 to 0 dl 1561416485 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:48:02 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f29b6764e00 x1631563634202912/t0(0) o101->3ef17f0c-d35b-8428-c1da-c84a40a8bdbc@10.9.101.71@o2ib4:7/0 lens 576/3264 e 1 to 0 dl 1561416487 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:48:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 327c2a50-dba2-1c9c-0f3d-801872275c5c (at 10.8.18.26@o2ib6) Jun 24 15:48:05 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jun 24 15:48:10 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2505939050 x1631301640417632/t0(0) o4->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1561416511 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:48:10 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jun 24 15:48:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44e6a79800 Jun 24 15:48:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6), client will retry: rc = -110 Jun 24 15:48:18 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416491/real 0] req@ffff8f168f37f200 x1636713474396208/t0(0) o104->fir-MDT0002@10.8.7.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561416498 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:48:18 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 59 previous similar messages Jun 24 15:48:19 fir-md1-s1 kernel: Lustre: 21433:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f16770ab300 x1631538709029264/t0(0) o101->ca15d879-1cb2-8780-e5e2-20230d9e27cf@10.8.28.3@o2ib6:24/0 lens 576/3264 e 0 to 0 dl 1561416504 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:48:19 fir-md1-s1 kernel: Lustre: 21433:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 24 15:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6) reconnecting Jun 24 15:48:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jun 24 15:48:30 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ec0abd600 Jun 24 15:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6), client will retry: rc = -110 Jun 24 15:48:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f222778b400 Jun 24 15:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6), client will retry: rc -110 Jun 24 15:48:36 fir-md1-s1 kernel: LustreError: 22156:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1f77339450 x1636570040419104/t0(0) o4->a6d577d8-fd68-2a67-a952-7c8d9e354cb8@10.8.8.24@o2ib6:2/0 lens 488/448 e 0 to 0 dl 1561416542 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:48:36 fir-md1-s1 kernel: LustreError: 22156:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jun 24 15:48:37 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:48:37 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 52 previous similar messages Jun 24 15:48:44 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17e7b06200 Jun 24 15:48:47 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1595ef9200 Jun 24 15:48:55 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a75e08000 Jun 24 15:48:55 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f160e218400 Jun 24 15:48:55 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17ab7d0000 Jun 24 15:48:57 fir-md1-s1 kernel: Lustre: 21368:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416530/real 0] req@ffff8f10419b1800 x1636713474551312/t0(0) o106->fir-MDT0000@10.8.27.24@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561416537 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:48:57 fir-md1-s1 kernel: Lustre: 21368:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 24 15:49:07 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f163b712400 x1635340613822832/t0(0) o101->c1c54f8a-db68-72ea-1f4f-3dc905e7ab7d@10.8.1.16@o2ib6:12/0 lens 480/568 e 0 to 0 dl 1561416552 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:49:07 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 24 15:49:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1635fbe200 Jun 24 15:49:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Jun 24 15:49:08 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 24 15:49:15 fir-md1-s1 kernel: LustreError: 22648:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1a8aff3050 x1636418132959360/t0(0) o4->304180e1-aa68-a4a4-ed4c-9536f53351a5@10.8.1.21@o2ib6:9/0 lens 488/448 e 0 to 0 dl 1561416579 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:49:15 fir-md1-s1 kernel: LustreError: 22648:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 7 previous similar messages Jun 24 15:49:15 fir-md1-s1 kernel: Lustre: 23556:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f06ce310c00 x1636996283584528/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 480/568 e 0 to 0 dl 1561416560 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:49:21 fir-md1-s1 kernel: Lustre: 21368:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f06ce310c00 x1636996283584528/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 480/536 e 0 to 0 dl 1561416560 ref 1 fl Complete:/0/0 rc 301/301 Jun 24 15:49:27 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f243d644200 Jun 24 15:49:27 fir-md1-s1 kernel: Lustre: 97658:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1667215100 x1631309731138176/t0(0) o101->2defae61-8bf0-dee6-7d48-53b83a69e973@10.8.17.24@o2ib6:2/0 lens 1808/3288 e 0 to 0 dl 1561416572 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:49:27 fir-md1-s1 kernel: Lustre: 97658:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jun 24 15:49:27 fir-md1-s1 kernel: Lustre: 24578:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3s); client may timeout. req@ffff8f160ae8a400 x1631557242055184/t348692003537(0) o101->84fd8c4b-6545-cd41-282d-ef5f651cba30@10.8.17.11@o2ib6:24/0 lens 1776/1192 e 0 to 0 dl 1561416564 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 15:49:27 fir-md1-s1 kernel: Lustre: 24578:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jun 24 15:49:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.1@o2ib6, removing former export from same NID Jun 24 15:49:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1adebcda00 Jun 24 15:49:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f162bb70600 Jun 24 15:49:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc -110 Jun 24 15:49:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.20.11@o2ib6, removing former export from same NID Jun 24 15:49:34 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jun 24 15:49:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.12@o2ib6, removing former export from same NID Jun 24 15:49:42 fir-md1-s1 kernel: Lustre: Skipped 187 previous similar messages Jun 24 15:49:42 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f167abf0c00 Jun 24 15:49:42 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1629310600 Jun 24 15:49:43 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f217a561e00 x1631575960598752/t0(0) o101->4dc6ad45-c67c-15d0-5638-611b0defe5f9@10.8.16.2@o2ib6:18/0 lens 376/1600 e 0 to 0 dl 1561416588 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:49:43 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Jun 24 15:49:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1fd244c200 Jun 24 15:49:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cc715be00 Jun 24 15:49:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3091efac00 Jun 24 15:49:53 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f25311d5e00 Jun 24 15:49:54 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:15s); client may timeout. req@ffff8f19f1ad8300 x1636669927723920/t348691996062(0) o36->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:9/0 lens 488/424 e 0 to 0 dl 1561416579 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 15:49:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1cfcaf8900/0x5d9ee6233f217b96 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 97 type: IBT flags: 0x60200400000020 nid: 10.8.27.22@o2ib6 remote: 0x4deb3a7a8dd7d1fe expref: 345 pid: 97645 timeout: 531656 lvb_type: 0 Jun 24 15:49:57 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:24s); client may timeout. req@ffff8f1667217500 x1635086170030736/t348692019075(0) o101->bc83c7c5-08aa-b1e5-1dd5-b1a51ba5cb4a@10.8.1.15@o2ib6:2/0 lens 1776/1192 e 0 to 0 dl 1561416572 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 15:49:57 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Jun 24 15:49:58 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1970efe600 Jun 24 15:49:58 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20532ca400 Jun 24 15:49:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.9@o2ib6, removing former export from same NID Jun 24 15:49:59 fir-md1-s1 kernel: Lustre: Skipped 323 previous similar messages Jun 24 15:49:59 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29b4673200 Jun 24 15:50:02 fir-md1-s1 kernel: Lustre: 97645:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561416592/real 0] req@ffff8f23597db900 x1636713474808752/t0(0) o104->fir-MDT0002@10.8.17.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561416602 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 24 15:50:02 fir-md1-s1 kernel: Lustre: 97645:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 31 previous similar messages Jun 24 15:50:02 fir-md1-s1 kernel: Lustre: 97671:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:18s); client may timeout. req@ffff8f1624b33000 x1631595884900736/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:14/0 lens 376/944 e 0 to 0 dl 1561416584 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 15:50:02 fir-md1-s1 kernel: Lustre: 97671:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jun 24 15:50:04 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e4de16000 Jun 24 15:50:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 51s: evicting client at 10.8.8.24@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f21d5d660c0/0x5d9ee6233b58597c lrc: 3/0,0 mode: PR/PR res: [0x2c002bf5a:0x5c34:0x0].0x0 bits 0x5b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.8.24@o2ib6 remote: 0xc0455945f6b89b52 expref: 11824 pid: 20730 timeout: 531665 lvb_type: 0 Jun 24 15:50:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 24 15:50:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 52s: evicting client at 10.8.16.2@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f5301bf00/0x5d9ee6233e9230c5 lrc: 3/0,0 mode: CR/CR res: [0x2c002be48:0x104df:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.16.2@o2ib6 remote: 0x24cf00c2f87a7f94 expref: 1910 pid: 26256 timeout: 531670 lvb_type: 0 Jun 24 15:50:10 fir-md1-s1 kernel: LustreError: 24579:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2501a6b400 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ceea72f40/0x5d9ee6233f3f88e3 lrc: 1/0,0 mode: EX/EX res: [0x2c002be48:0x104df:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x54801000000000 nid: 10.8.16.2@o2ib6 remote: 0x24cf00c2f87a8004 expref: 1296 pid: 24579 timeout: 0 lvb_type: 3 Jun 24 15:50:10 fir-md1-s1 kernel: LustreError: 24579:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Jun 24 15:50:11 fir-md1-s1 kernel: Lustre: 24579:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:22s); client may timeout. req@ffff8f217a561e00 x1631575960598752/t348692025511(0) o101->4dc6ad45-c67c-15d0-5638-611b0defe5f9@10.8.16.2@o2ib6:18/0 lens 376/1568 e 0 to 0 dl 1561416588 ref 1 fl Complete:/0/0 rc -107/-107 Jun 24 15:50:11 fir-md1-s1 kernel: Lustre: 24579:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jun 24 15:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f590aa0d-878d-f7af-2791-1d94ccac0e1f (at 10.8.18.1@o2ib6) Jun 24 15:50:13 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Jun 24 15:50:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1653697c00 Jun 24 15:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with cea6adbc-46ce-842f-a429-3350fc5db284 (at 10.8.18.26@o2ib6), client will retry: rc = -110 Jun 24 15:50:13 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jun 24 15:50:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 54s: evicting client at 10.8.29.5@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f19a86357c0/0x5d9ee6233ea985f5 lrc: 3/0,0 mode: PR/PR res: [0x20002993d:0x274:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.29.5@o2ib6 remote: 0xc606c8a810cda247 expref: 104 pid: 22007 timeout: 531675 lvb_type: 0 Jun 24 15:50:15 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f163caff200 x1631695367410864/t0(0) o101->e0767d77-866c-9038-3794-0af657e399d1@10.8.8.22@o2ib6:20/0 lens 1936/3288 e 0 to 0 dl 1561416620 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:50:15 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 50 previous similar messages Jun 24 15:50:16 fir-md1-s1 kernel: LustreError: 23455:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f34ed706400 ns: mdt-fir-MDT0000_UUID lock: ffff8f232b219200/0x5d9ee6233f441c8a lrc: 3/0,0 mode: PW/PW res: [0x20002993d:0x274:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.29.5@o2ib6 remote: 0xc606c8a810cda2e8 expref: 79 pid: 23455 timeout: 0 lvb_type: 0 Jun 24 15:50:19 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f163ff40600 Jun 24 15:50:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.17.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f21d7d81d40/0x5d9ee6233f32b986 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 82 type: IBT flags: 0x60200400000020 nid: 10.8.17.11@o2ib6 remote: 0x23a0b048f5b281f7 expref: 749 pid: 97638 timeout: 531686 lvb_type: 0 Jun 24 15:50:28 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1a8aff3c50 x1631543243076544/t0(0) o4->20ffa3e6-2ce8-ff35-0cee-96ba2468fd67@10.8.17.13@o2ib6:12/0 lens 488/448 e 0 to 0 dl 1561416642 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:50:28 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages Jun 24 15:50:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.27.1@o2ib6, removing former export from same NID Jun 24 15:50:31 fir-md1-s1 kernel: Lustre: Skipped 337 previous similar messages Jun 24 15:50:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 044042bf-dd57-7ee7-fd56-cb18003c928b (at 10.8.7.32@o2ib6) reconnecting Jun 24 15:50:34 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jun 24 15:50:36 fir-md1-s1 kernel: Lustre: 97667:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:23s); client may timeout. req@ffff8f1bd273e600 x1631562893236800/t0(0) o101->69e867f7-2c34-9281-0411-6ff880d43ef5@10.8.28.11@o2ib6:13/0 lens 384/1040 e 0 to 0 dl 1561416613 ref 1 fl Complete:/0/0 rc 0/0 Jun 24 15:50:36 fir-md1-s1 kernel: Lustre: 97667:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jun 24 15:50:37 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f237fb96400 Jun 24 15:50:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f167abf1400 Jun 24 15:50:41 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20968b2e00 Jun 24 15:50:44 fir-md1-s1 kernel: LustreError: 97669:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416554, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1ce7674380/0x5d9ee6233f3ad679 lrc: 3/0,1 mode: --/CW res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x2/0x0 rrc: 75 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97669 timeout: 0 lvb_type: 0 Jun 24 15:50:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 24 15:50:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 90 previous similar messages Jun 24 15:50:47 fir-md1-s1 kernel: LustreError: 22289:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416557, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1706f3da00/0x5d9ee6233f3e7aa6 lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 75 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22289 timeout: 0 lvb_type: 0 Jun 24 15:50:47 fir-md1-s1 kernel: LustreError: 22289:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jun 24 15:50:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 56s: evicting client at 10.8.27.3@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3fd871ad00/0x5d9ee6233efe9fae lrc: 3/0,0 mode: PR/PR res: [0x2c002bea6:0x1e36b:0x0].0x0 bits 0x13/0x0 rrc: 80 type: IBT flags: 0x60200400000020 nid: 10.8.27.3@o2ib6 remote: 0xf651ae946746c380 expref: 129 pid: 20722 timeout: 531708 lvb_type: 0 Jun 24 15:50:49 fir-md1-s1 kernel: LustreError: 50445:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416559, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f21aa0086c0/0x5d9ee6233f412f0c lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 73 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 50445 timeout: 0 lvb_type: 0 Jun 24 15:50:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1626724600 Jun 24 15:50:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f3201a000 Jun 24 15:51:03 fir-md1-s1 kernel: LustreError: 25082:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.7.32@o2ib6 arrived at 1561416663 with bad export cookie 6746082289100437273 Jun 24 15:51:06 fir-md1-s1 kernel: LustreError: 21712:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1a8aff1450 x1637258515994832/t0(0) o3->b09d4c25-b109-b30c-132e-6a644105be34@10.8.9.9@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1561416666 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b09d4c25-b109-b30c-132e-6a644105be34 (at 10.8.9.9@o2ib6), client will retry: rc -110 Jun 24 15:51:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 24 15:51:07 fir-md1-s1 kernel: LustreError: 42895:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f21343b8450 x1634352865824384/t0(0) o4->eb079895-c48f-19eb-1198-2b2f152dbaf1@10.8.26.34@o2ib6:7/0 lens 488/448 e 0 to 0 dl 1561416667 ref 1 fl Interpret:/2/0 rc 0/0 Jun 24 15:51:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a6880a400 Jun 24 15:51:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f161ff6dc00 Jun 24 15:51:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d7b9b7e00 Jun 24 15:51:08 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1dcf7f8c50 x1631567245092016/t0(0) o4->c85b79ba-f35a-df4c-7ce6-3db4837c1dc9@10.8.18.1@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1561416668 ref 1 fl Interpret:/2/0 rc 0/0 Jun 24 15:51:08 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jun 24 15:51:09 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1696a6fc00 Jun 24 15:51:09 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f1dcf7f8c50 x1631567245092016/t0(0) o4->c85b79ba-f35a-df4c-7ce6-3db4837c1dc9@10.8.18.1@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1561416668 ref 1 fl Complete:/2/ffffffff rc -110/-1 Jun 24 15:51:09 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Jun 24 15:51:10 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16a2e9d800 Jun 24 15:51:10 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21437a6e00 Jun 24 15:51:11 fir-md1-s1 kernel: LustreError: 22648:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1f7733ac50 x1636569714132368/t0(0) o4->5d60b790-0b15-ff01-65b5-d8a0250b0e53@10.8.1.29@o2ib6:11/0 lens 488/448 e 0 to 0 dl 1561416671 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:11 fir-md1-s1 kernel: LustreError: 22648:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jun 24 15:51:14 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2348eaa400 Jun 24 15:51:15 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1891e22000 Jun 24 15:51:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f746adc00 Jun 24 15:51:16 fir-md1-s1 kernel: LustreError: 20461:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.29.6@o2ib6) failed to reply to blocking AST (req@ffff8f161ca61b00 x1636713475012736 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f22fdbb0000/0x5d9ee6233f474ec6 lrc: 4/0,0 mode: EX/EX res: [0x2c002bf84:0x9313:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.8.29.6@o2ib6 remote: 0xcb7f8716e1872de0 expref: 14900 pid: 22004 timeout: 531732 lvb_type: 3 Jun 24 15:51:16 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.29.6@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 24 15:51:16 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1ee25a3450 x1631683569568144/t0(0) o4->a82097ea-0a83-cc99-985b-882074216844@10.8.12.13@o2ib6:16/0 lens 504/448 e 0 to 0 dl 1561416676 ref 1 fl Interpret:/2/0 rc 0/0 Jun 24 15:51:16 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jun 24 15:51:16 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1df1688400 Jun 24 15:51:17 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20cd79f400 Jun 24 15:51:19 fir-md1-s1 kernel: Lustre: 25998:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1974ba8050 x1634927199608064/t0(0) o4->8e6b7782-0f04-da33-0138-eab1c9e41ffb@10.8.18.25@o2ib6:24/0 lens 488/448 e 0 to 0 dl 1561416684 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:19 fir-md1-s1 kernel: Lustre: 25998:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 120 previous similar messages Jun 24 15:51:20 fir-md1-s1 kernel: LustreError: 24584:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416590, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f216ce20000/0x5d9ee6233f6e6dd5 lrc: 3/0,1 mode: --/CW res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x2/0x0 rrc: 69 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24584 timeout: 0 lvb_type: 0 Jun 24 15:51:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ece1a0200 Jun 24 15:51:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1706f3da00/0x5d9ee6233f3e7aa6 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 71 type: IBT flags: 0x60200400000020 nid: 10.8.8.17@o2ib6 remote: 0x68316722491f52a3 expref: 2391 pid: 22289 timeout: 531743 lvb_type: 0 Jun 24 15:51:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 24 15:51:24 fir-md1-s1 kernel: LustreError: 83752:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f3438f70c00 x1636713475180944/t0(0) o105->fir-MDT0002@10.8.28.11@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 24 15:51:24 fir-md1-s1 kernel: LustreError: 21434:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f25350ae800 ns: mdt-fir-MDT0002_UUID lock: ffff8f3227283f00/0x5d9ee6233f70e4d1 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 55 type: IBT flags: 0x50200000000000 nid: 10.8.28.3@o2ib6 remote: 0x8a5f985bbadec0dc expref: 7 pid: 21434 timeout: 0 lvb_type: 0 Jun 24 15:51:24 fir-md1-s1 kernel: LustreError: 21497:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk WRITE req@ffff8f180f6a2c50 x1631538709055328/t0(0) o4->ca15d879-1cb2-8780-e5e2-20230d9e27cf@10.8.28.3@o2ib6:16/0 lens 488/448 e 0 to 0 dl 1561416706 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:25 fir-md1-s1 kernel: LustreError: 46590:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1974ba8050 x1634927199608064/t0(0) o4->8e6b7782-0f04-da33-0138-eab1c9e41ffb@10.8.18.25@o2ib6:24/0 lens 488/448 e 0 to 0 dl 1561416684 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:25 fir-md1-s1 kernel: LustreError: 46590:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 9 previous similar messages Jun 24 15:51:25 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f209d4b1c00 Jun 24 15:51:25 fir-md1-s1 kernel: LustreError: 22136:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.28.11@o2ib6 arrived at 1561416685 with bad export cookie 6746082289092222801 Jun 24 15:51:26 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1666b9ce00 Jun 24 15:51:26 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f208f03b000 Jun 24 15:51:26 fir-md1-s1 kernel: LustreError: 23103:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.26@o2ib6 arrived at 1561416686 with bad export cookie 6746082289097843395 Jun 24 15:51:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f248da4a600 Jun 24 15:51:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2356cf4a00 Jun 24 15:51:29 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16fab25c00 Jun 24 15:51:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1632bfc800 Jun 24 15:51:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c5c907a00 Jun 24 15:51:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6), client will retry: rc -110 Jun 24 15:51:30 fir-md1-s1 kernel: LustreError: 20722:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4505753000 ns: mdt-fir-MDT0002_UUID lock: ffff8f17a141da00/0x5d9ee6233fcc727c lrc: 1/0,0 mode: EX/EX res: [0x2c002bf83:0xe7e7:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.29.5@o2ib6 remote: 0xc606c8a810cda319 expref: 12 pid: 20722 timeout: 0 lvb_type: 3 Jun 24 15:51:30 fir-md1-s1 kernel: LustreError: 25074:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.5@o2ib6 arrived at 1561416690 with bad export cookie 6746082289097820148 Jun 24 15:51:30 fir-md1-s1 kernel: LustreError: 20722:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jun 24 15:51:31 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f167a83e600 Jun 24 15:51:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cd303b400 Jun 24 15:51:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f162cfea000 Jun 24 15:51:33 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1af4245e00 Jun 24 15:51:34 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f162cfe8e00 Jun 24 15:51:34 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34cb63a800 Jun 24 15:51:34 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ceab74e00 Jun 24 15:51:34 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1628a03600 Jun 24 15:51:34 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d3aec7800 Jun 24 15:51:35 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1628a07a00 Jun 24 15:51:35 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f179ab17800 Jun 24 15:51:36 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4490bf9800 Jun 24 15:51:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.24@o2ib6, removing former export from same NID Jun 24 15:51:36 fir-md1-s1 kernel: Lustre: Skipped 724 previous similar messages Jun 24 15:51:36 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a55fccc00 Jun 24 15:51:36 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.26@o2ib6 arrived at 1561416696 with bad export cookie 6746082289097843395 Jun 24 15:51:36 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 5 previous similar messages Jun 24 15:51:36 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1da986f800 Jun 24 15:51:38 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f204374ba00 Jun 24 15:51:39 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2b8438ca00 Jun 24 15:51:39 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33c9620e00 Jun 24 15:51:39 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24ab328800 Jun 24 15:51:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33c9626000 Jun 24 15:51:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d59e1fc00 Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20721:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f24fe577800 ns: mdt-fir-MDT0002_UUID lock: ffff8f0946a6bf00/0x5d9ee6233fe3d0a4 lrc: 3/0,0 mode: PW/PW res: [0x2c002bf5a:0x62a9:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x50200000000000 nid: 10.8.17.15@o2ib6 remote: 0x5909d6587625933f expref: 3 pid: 20721 timeout: 0 lvb_type: 0 Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20721:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3091ef9c00 Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1eeae51c00 Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a9fefa800 Jun 24 15:51:41 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f167ee3e200 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1fffdaf200 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f8fb80e00 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f167ee3ee00 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1fffdafe00 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1fffda9200 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 21389:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1ee25a5050 x1631564825922512/t0(0) o4->04031d35-e75a-0623-0a2e-3f8a84f80ab5@10.8.27.15@o2ib6:12/0 lens 488/448 e 0 to 0 dl 1561416702 ref 1 fl Interpret:/0/0 rc 0/0 Jun 24 15:51:42 fir-md1-s1 kernel: LustreError: 21389:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 22 previous similar messages Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1647e77c00 Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f164af5c000 Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ed9e8fe00 Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2504f73400 Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d7b9b4a00 Jun 24 15:51:43 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22e4b21000 Jun 24 15:51:45 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b3ca2f600 Jun 24 15:51:45 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.28.11@o2ib6 arrived at 1561416705 with bad export cookie 6746082289092222801 Jun 24 15:51:45 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Jun 24 15:51:48 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17bb5fb000 Jun 24 15:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b09d4c25-b109-b30c-132e-6a644105be34 (at 10.8.9.9@o2ib6), client will retry: rc -110 Jun 24 15:51:48 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 24 15:51:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2536ed6400 Jun 24 15:51:49 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f182d76fe00 Jun 24 15:51:51 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f182d70fe00 Jun 24 15:51:57 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1798378400 Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 50446:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.10@o2ib6) failed to reply to blocking AST (req@ffff8f1fb8f0c500 x1636713475156992 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1e163d0b40/0x5d9ee622da3415bb lrc: 4/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1234 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a5314c69ab45 expref: 766118 pid: 24587 timeout: 531794 lvb_type: 0 Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.10@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 49s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e163d0b40/0x5d9ee622da3415bb lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1233 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a5314c69ab45 expref: 766095 pid: 24587 timeout: 0 lvb_type: 0 Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 20930:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1561416725 with bad export cookie 6746082289090716541 Jun 24 15:52:05 fir-md1-s1 kernel: LustreError: 20930:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 15 previous similar messages Jun 24 15:52:15 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2305afe200 Jun 24 15:52:15 fir-md1-s1 kernel: Lustre: 21896:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:5s); client may timeout. req@ffff8f24f4914450 x1635709199425856/t0(0) o37->09fe1fc8-d186-6314-b715-72bcbbf4dcb1@10.8.1.35@o2ib6:10/0 lens 448/408 e 1 to 0 dl 1561416730 ref 1 fl Complete:/0/0 rc -110/-110 Jun 24 15:52:15 fir-md1-s1 kernel: Lustre: 21896:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 78 previous similar messages Jun 24 15:52:16 fir-md1-s1 kernel: LustreError: 97641:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.0.65@o2ib6: deadline 30:4s ago req@ffff8f18a2079500 x1634092354836336/t0(0) o101->87da5719-38f8-e25f-27bd-899baebba0f4@10.8.0.65@o2ib6:12/0 lens 576/0 e 0 to 0 dl 1561416732 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jun 24 15:52:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 5d60b790-0b15-ff01-65b5-d8a0250b0e53 (at 10.8.1.29@o2ib6), client will retry: rc = -110 Jun 24 15:52:23 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jun 24 15:52:23 fir-md1-s1 kernel: LustreError: 50444:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.8.12@o2ib6: deadline 30:7s ago req@ffff8f1a72f4cb00 x1634455977415344/t0(0) o101->b95afc0f-d5ce-0d5e-e5e9-03cd8d169d60@10.8.8.12@o2ib6:16/0 lens 576/0 e 0 to 0 dl 1561416736 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jun 24 15:52:23 fir-md1-s1 kernel: LustreError: 50444:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 89 previous similar messages Jun 24 15:52:26 fir-md1-s1 kernel: LustreError: 97666:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.17.9@o2ib6: deadline 30:3s ago req@ffff8f1d20601e00 x1635343772185568/t0(0) o101->51002e48-a06e-3405-fcaa-ac377ed743af@10.8.17.9@o2ib6:23/0 lens 576/0 e 0 to 0 dl 1561416743 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jun 24 15:52:26 fir-md1-s1 kernel: LustreError: 97666:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 474 previous similar messages Jun 24 15:52:46 fir-md1-s1 kernel: LustreError: 10197:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416676, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f13baa669c0/0x5d9ee6233fef45b2 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10197 timeout: 0 lvb_type: 0 Jun 24 15:52:46 fir-md1-s1 kernel: LustreError: 10197:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 25 previous similar messages Jun 24 15:52:55 fir-md1-s1 kernel: LustreError: 23582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416684, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3c12216780/0x5d9ee6233fefc9f5 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23582 timeout: 0 lvb_type: 0 Jun 24 15:52:55 fir-md1-s1 kernel: LustreError: 23582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 148 previous similar messages Jun 24 15:53:10 fir-md1-s1 kernel: LustreError: 21415:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.17@o2ib4: deadline 30:1s ago req@ffff8f2f45f40c00 x1634122400223376/t0(0) o101->459a4674-896d-e57f-5fbe-6e6932e88880@10.9.106.17@o2ib4:9/0 lens 576/0 e 0 to 0 dl 1561416789 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jun 24 15:53:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 24 15:53:10 fir-md1-s1 kernel: LustreError: 21415:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 210 previous similar messages Jun 24 15:53:14 fir-md1-s1 kernel: LustreError: 22282:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f162a623c00 x1636713475360464/t0(0) o104->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 24 15:53:14 fir-md1-s1 kernel: LustreError: 22282:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jun 24 15:53:30 fir-md1-s1 kernel: LustreError: 20462:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f18dbffb000 x1636713475556288/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 24 15:53:39 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d5876e000 x1636443218517136/t0(0) o101->7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0@10.8.29.6@o2ib6:14/0 lens 1784/3288 e 0 to 0 dl 1561416824 ref 2 fl Interpret:/0/0 rc 0/0 Jun 24 15:53:39 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2191 previous similar messages Jun 24 15:53:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.35@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2356b0d580/0x5d9ee62327c3cc87 lrc: 3/0,0 mode: PR/PR res: [0x2c002be88:0xe04f:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.35@o2ib6 remote: 0xe7fd3d175f79dfa5 expref: 71235 pid: 21481 timeout: 531883 lvb_type: 0 Jun 24 15:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to bdf06334-3a1e-8f45-20cb-38a64ac80139 (at 10.8.29.5@o2ib6) Jun 24 15:54:32 fir-md1-s1 kernel: Lustre: Skipped 2498 previous similar messages Jun 24 15:55:00 fir-md1-s1 kernel: LustreError: 20726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416810, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1ee5f38240/0x5d9ee6234033997e lrc: 3/0,1 mode: --/PW res: [0x2000222aa:0x10e:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20726 timeout: 0 lvb_type: 0 Jun 24 15:55:00 fir-md1-s1 kernel: LustreError: 20462:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561416810, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f1ab94380/0x5d9ee62340339970 lrc: 3/0,1 mode: --/PW res: [0x200025b09:0x2437:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20462 timeout: 0 lvb_type: 0 Jun 24 15:55:00 fir-md1-s1 kernel: LustreError: 20462:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 216 previous similar messages Jun 24 15:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6) reconnecting Jun 24 15:55:03 fir-md1-s1 kernel: Lustre: Skipped 998 previous similar messages Jun 24 15:57:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) in 230 seconds. I think it's dead, and I am evicting it. exp ffff8f22f15df000, cur 1561417020 expire 1561416870 last 1561416790 Jun 24 16:14:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 24 16:14:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 24 21:20:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jun 24 21:21:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae85bd6d-3abb-15dd-50c5-ec36d3fe0421 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1867500400, cur 1561436462 expire 1561436312 last 1561436235 Jun 24 22:52:55 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jun 24 22:52:55 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 88 previous similar messages Jun 25 09:03:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d4a6325e-22ba-0473-b0bb-1ac629cc9b52 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501b00c00, cur 1561478620 expire 1561478470 last 1561478393 Jun 25 09:03:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 09:03:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jun 25 09:03:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 09:07:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jun 25 09:07:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2ae4f990-b2cb-626b-12c1-a51b5888422d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28e523c400, cur 1561478853 expire 1561478703 last 1561478626 Jun 25 09:07:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 12:40:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 43eb156b-cf2c-6d44-b021-842e2a3ba6bf (at 10.8.14.1@o2ib6) Jun 25 12:40:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 12:41:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2900ab2e-d5c8-984c-4497-834ead5e0c0c (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3ea0000, cur 1561491675 expire 1561491525 last 1561491448 Jun 25 12:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2900ab2e-d5c8-984c-4497-834ead5e0c0c (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252023b800, cur 1561491686 expire 1561491536 last 1561491459 Jun 25 12:41:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 12:53:37 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 25 12:53:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) reconnecting Jun 25 12:53:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 12:53:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 25 12:53:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 13:37:48 fir-md1-s1 kernel: Lustre: 21370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0b9d97b600 x1634476177653344/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:23/0 lens 480/568 e 0 to 0 dl 1561495073 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 13:37:48 fir-md1-s1 kernel: Lustre: 21370:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jun 25 13:37:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3164e28480/0x5d9ee62524f94c87 lrc: 3/0,0 mode: PW/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3e6f4a8b90 expref: 23720 pid: 23454 timeout: 610132 lvb_type: 0 Jun 25 13:37:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jun 25 13:37:53 fir-md1-s1 kernel: LustreError: 20369:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.31@o2ib6 arrived at 1561495073 with bad export cookie 6746082289096703970 Jun 25 13:37:53 fir-md1-s1 kernel: LustreError: 20369:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 5917 previous similar messages Jun 25 13:37:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jun 25 13:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 13:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 13:40:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14c7771c00, cur 1561495206 expire 1561495056 last 1561494979 Jun 25 13:40:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4537985c00, cur 1561495219 expire 1561495069 last 1561494992 Jun 25 13:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) in 222 seconds. I think it's dead, and I am evicting it. exp ffff8f162e7b6800, cur 1561495295 expire 1561495145 last 1561495073 Jun 25 13:41:36 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1a80bd7200 x1631309824666512/t0(0) o101->2defae61-8bf0-dee6-7d48-53b83a69e973@10.8.17.24@o2ib6:11/0 lens 480/568 e 0 to 0 dl 1561495301 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 13:41:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jun 25 13:41:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.7.8@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f20c534ad00/0x5d9ee62524f5a84c lrc: 3/0,0 mode: PW/PW res: [0x2c002c286:0x916e:0x0].0x0 bits 0x40/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.8.7.8@o2ib6 remote: 0x9a03b0d8ce0febf6 expref: 351 pid: 97651 timeout: 610360 lvb_type: 0 Jun 25 13:41:43 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.7.8@o2ib6 arrived at 1561495303 with bad export cookie 6746082289090927066 Jun 25 13:41:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d5145b19-7e77-2465-cb06-19cf549382e1 (at 10.8.7.8@o2ib6) Jun 25 13:41:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3e5ac00, cur 1561495315 expire 1561495165 last 1561495088 Jun 25 13:46:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f8fcb29b-d706-0b08-6893-aa94c8d5e667 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f233e810000, cur 1561495577 expire 1561495427 last 1561495350 Jun 25 13:46:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jun 25 13:46:57 fir-md1-s1 kernel: Lustre: 23622:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f351a0f7850 x1634476184772000/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:2/0 lens 480/568 e 1 to 0 dl 1561495622 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 13:47:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 13:47:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 13:47:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 13:47:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.17.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1cff79f740/0x5d9ee62527f6acb5 lrc: 3/0,0 mode: PW/PW res: [0x2c002c286:0x915a:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.8.17.15@o2ib6 remote: 0x5909d658763498c3 expref: 741 pid: 97643 timeout: 610688 lvb_type: 0 Jun 25 13:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 13:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 13:47:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 25 13:48:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 25 13:48:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 13:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) reconnecting Jun 25 13:48:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 13:48:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 13:48:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 25 13:48:41 fir-md1-s1 kernel: Lustre: Skipped 461 previous similar messages Jun 25 14:06:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jun 25 14:06:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 25 14:06:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d5fc548e-054d-12d9-54b9-977767ad7c03 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16536ff400, cur 1561496817 expire 1561496667 last 1561496590 Jun 25 14:06:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:12:50 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561497163/real 1561497163] req@ffff8f1073b68300 x1636714409925136/t0(0) o104->fir-MDT0002@10.8.14.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561497170 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 25 14:12:50 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 534 previous similar messages Jun 25 14:12:58 fir-md1-s1 kernel: Lustre: 23598:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f07c3ec0c00 x1631601318285360/t0(0) o101->f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb@10.9.112.4@o2ib4:3/0 lens 1784/3288 e 1 to 0 dl 1561497183 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:12:58 fir-md1-s1 kernel: Lustre: 23598:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 25 14:13:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) reconnecting Jun 25 14:13:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:13:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4ac555d7-5727-5203-83f8-102dd77ed0e4 (at 10.9.112.4@o2ib4) Jun 25 14:13:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:13:11 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561497184/real 1561497184] req@ffff8f1073b68300 x1636714409925136/t0(0) o104->fir-MDT0002@10.8.14.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561497191 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:13:11 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Jun 25 14:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) reconnecting Jun 25 14:13:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4ac555d7-5727-5203-83f8-102dd77ed0e4 (at 10.9.112.4@o2ib4) Jun 25 14:13:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:13:46 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561497219/real 1561497219] req@ffff8f1073b68300 x1636714409925136/t0(0) o104->fir-MDT0002@10.8.14.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561497226 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:13:46 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Jun 25 14:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) reconnecting Jun 25 14:13:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:14:07 fir-md1-s1 kernel: Lustre: 10149:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f161ea8c200 x1631676600998688/t0(0) o101->92ffa420-d747-a973-baf2-68cec64e7e81@10.9.113.14@o2ib4:12/0 lens 1784/3288 e 1 to 0 dl 1561497252 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:14:07 fir-md1-s1 kernel: Lustre: 10149:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jun 25 14:14:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4ac555d7-5727-5203-83f8-102dd77ed0e4 (at 10.9.112.4@o2ib4) Jun 25 14:14:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 25 14:14:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) reconnecting Jun 25 14:14:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 25 14:14:36 fir-md1-s1 kernel: LustreError: 23595:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.14.1@o2ib6) returned error from blocking AST (req@ffff8f1073b68300 x1636714409925136 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f197b5bd580/0x5d9ee6251a154b9d lrc: 4/0,0 mode: PR/PR res: [0x2c002bea6:0x1eebd:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.14.1@o2ib6 remote: 0xd0d7046257247141 expref: 712 pid: 97641 timeout: 612484 lvb_type: 0 Jun 25 14:14:36 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.14.1@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Jun 25 14:14:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 113s: evicting client at 10.8.14.1@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f5065c800/0x5d9ee6251a155903 lrc: 3/0,0 mode: PR/PR res: [0x2c002bea6:0x1ee30:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.14.1@o2ib6 remote: 0xd0d7046257247466 expref: 713 pid: 20545 timeout: 0 lvb_type: 0 Jun 25 14:14:36 fir-md1-s1 kernel: LustreError: 23595:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Jun 25 14:15:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 43eb156b-cf2c-6d44-b021-842e2a3ba6bf (at 10.8.14.1@o2ib6) Jun 25 14:15:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 25 14:15:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aa5f6715-716f-cf30-713a-acb85093703e (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501639400, cur 1561497357 expire 1561497207 last 1561497130 Jun 25 14:15:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:20:14 fir-md1-s1 kernel: Lustre: 20555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1af56cad00 x1634476201059968/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:19/0 lens 480/568 e 0 to 0 dl 1561497619 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:20:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2521359680/0x5d9ee62534abbefd lrc: 3/0,0 mode: PW/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3e6f4b4086 expref: 23 pid: 20722 timeout: 612678 lvb_type: 0 Jun 25 14:20:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jun 25 14:20:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:22:23 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24e7dd9800, cur 1561497743 expire 1561497593 last 1561497516 Jun 25 14:22:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ebef6758-802b-3d88-0fb7-39f9e3a97c72 (at 10.8.0.67@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521152400, cur 1561497853 expire 1561497703 last 1561497626 Jun 25 14:26:30 fir-md1-s1 kernel: Lustre: 21668:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f148a6a8c00 x1634476203917216/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:5/0 lens 480/568 e 0 to 0 dl 1561497995 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:26:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 14:26:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:26:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 14:26:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 14:26:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 14:26:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 14:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 14:27:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 14:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 14:27:36 fir-md1-s1 kernel: LustreError: 23653:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561497965, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f153468c380/0x5d9ee62537843d97 lrc: 3/0,1 mode: --/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23653 timeout: 0 lvb_type: 0 Jun 25 14:27:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 14:27:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:28:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2059d557c0/0x5d9ee625376e5444 lrc: 3/0,0 mode: PW/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3e6f4b4fcf expref: 22 pid: 22007 timeout: 613175 lvb_type: 0 Jun 25 14:29:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f10667e1800, cur 1561498187 expire 1561498037 last 1561497960 Jun 25 14:29:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:31:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25218f0400, cur 1561498278 expire 1561498128 last 1561498051 Jun 25 14:35:48 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561498537/real 1561498537] req@ffff8f22c1417200 x1636714432316944/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561498548 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 25 14:35:48 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 34 previous similar messages Jun 25 14:35:55 fir-md1-s1 kernel: Lustre: 25678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0d7c3b0f00 x1634123299186224/t0(0) o36->6c224fde-2a1b-f3eb-fdf9-6a986a61a55a@10.9.108.4@o2ib4:0/0 lens 536/2888 e 1 to 0 dl 1561498560 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:35:59 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561498548/real 1561498548] req@ffff8f22c1417200 x1636714432316944/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561498559 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6c224fde-2a1b-f3eb-fdf9-6a986a61a55a (at 10.9.108.4@o2ib4) reconnecting Jun 25 14:36:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to cffa9ca6-4860-be91-20b9-abd21a031d37 (at 10.9.108.4@o2ib4) Jun 25 14:36:01 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 25 14:36:21 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561498570/real 1561498570] req@ffff8f22c1417200 x1636714432316944/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561498581 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:36:21 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 25 14:36:54 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561498603/real 1561498603] req@ffff8f22c1417200 x1636714432316944/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561498614 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:36:54 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 25 14:37:02 fir-md1-s1 kernel: LustreError: 23716:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561498532, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1f0e79cc80/0x5d9ee62539efccf5 lrc: 3/0,1 mode: --/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23716 timeout: 0 lvb_type: 0 Jun 25 14:37:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Jun 25 14:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Jun 25 14:37:10 fir-md1-s1 kernel: LustreError: 21370:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561498540, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f10d804cc80/0x5d9ee62539f7e3a5 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee62539f7e3ac expref: -99 pid: 21370 timeout: 0 lvb_type: 0 Jun 25 14:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) reconnecting Jun 25 14:37:10 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jun 25 14:37:28 fir-md1-s1 kernel: LustreError: 97658:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561498558, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f184bf06300/0x5d9ee6253a1036b8 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6253a1036bf expref: -99 pid: 97658 timeout: 0 lvb_type: 0 Jun 25 14:37:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Jun 25 14:37:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:38:00 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561498669/real 1561498669] req@ffff8f22c1417200 x1636714432316944/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561498680 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 14:38:00 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jun 25 14:38:01 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f699f9680/0x5d9ee62539ecd807 lrc: 3/0,0 mode: PW/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3e6f4b6531 expref: 20 pid: 50445 timeout: 613741 lvb_type: 0 Jun 25 14:38:11 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.0.65@o2ib6) failed to reply to blocking AST (req@ffff8f22c1417200 x1636714432316944 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f22fcc47740/0x5d9ee6252ea5cf70 lrc: 4/0,0 mode: PR/PR res: [0x2c0000404:0x2d3:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.65@o2ib6 remote: 0xf3ad1a144a9c4e3 expref: 749697 pid: 23455 timeout: 613889 lvb_type: 0 Jun 25 14:38:11 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.65@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 25 14:38:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 14:38:12 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498692 with bad export cookie 6746082339115562538 Jun 25 14:38:13 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498693 with bad export cookie 6746082339115562538 Jun 25 14:38:13 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 9 previous similar messages Jun 25 14:38:15 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498695 with bad export cookie 6746082339115562538 Jun 25 14:38:15 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 11 previous similar messages Jun 25 14:38:19 fir-md1-s1 kernel: LustreError: 25078:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498699 with bad export cookie 6746082339115562538 Jun 25 14:38:19 fir-md1-s1 kernel: LustreError: 25078:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 19 previous similar messages Jun 25 14:38:28 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498708 with bad export cookie 6746082339115562538 Jun 25 14:38:28 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 29 previous similar messages Jun 25 14:38:44 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498724 with bad export cookie 6746082339115562538 Jun 25 14:38:44 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 55 previous similar messages Jun 25 14:38:57 fir-md1-s1 kernel: LNet: Service thread pid 20555 was inactive for 200.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 25 14:38:57 fir-md1-s1 kernel: Pid: 20555, comm: mdt01_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 25 14:38:57 fir-md1-s1 kernel: Call Trace: Jun 25 14:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x890 [ptlrpc] Jun 25 14:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_object_lock_save+0x29/0x50 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_reint_rename+0x4ce/0x2b90 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 25 14:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 25 14:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 25 14:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 25 14:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 25 14:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 25 14:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 25 14:38:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561498737.20555 Jun 25 14:38:58 fir-md1-s1 kernel: LNet: Service thread pid 50445 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 25 14:38:58 fir-md1-s1 kernel: Pid: 50445, comm: mdt01_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 25 14:38:58 fir-md1-s1 kernel: Call Trace: Jun 25 14:38:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 25 14:38:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 25 14:38:58 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Jun 25 14:38:58 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 25 14:38:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 25 14:38:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 25 14:38:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 25 14:38:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 25 14:38:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 25 14:38:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 25 14:38:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 25 14:38:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 25 14:38:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 25 14:38:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561498738.50445 Jun 25 14:39:00 fir-md1-s1 kernel: LNet: Service thread pid 21370 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 25 14:39:00 fir-md1-s1 kernel: Pid: 21370, comm: mdt00_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 25 14:39:00 fir-md1-s1 kernel: Call Trace: Jun 25 14:39:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 25 14:39:00 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 25 14:39:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 25 14:39:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 25 14:39:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 25 14:39:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 25 14:39:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561498740.21370 Jun 25 14:39:16 fir-md1-s1 kernel: LustreError: 22009:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498756 with bad export cookie 6746082339115562538 Jun 25 14:39:16 fir-md1-s1 kernel: LustreError: 22009:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 112 previous similar messages Jun 25 14:39:19 fir-md1-s1 kernel: LNet: Service thread pid 97658 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 25 14:39:19 fir-md1-s1 kernel: Pid: 97658, comm: mdt01_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 25 14:39:19 fir-md1-s1 kernel: Call Trace: Jun 25 14:39:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 25 14:39:19 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 25 14:39:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 25 14:39:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 25 14:39:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 25 14:39:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 25 14:39:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561498759.97658 Jun 25 14:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6c224fde-2a1b-f3eb-fdf9-6a986a61a55a (at 10.9.108.4@o2ib4) reconnecting Jun 25 14:39:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jun 25 14:39:41 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561498691, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2517b7d7c0/0x5d9ee62539f50611 lrc: 3/0,1 mode: --/CW res: [0x2c0000404:0x2d3:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20555 timeout: 0 lvb_type: 0 Jun 25 14:39:41 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jun 25 14:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Jun 25 14:40:20 fir-md1-s1 kernel: LustreError: 25080:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498820 with bad export cookie 6746082339115562538 Jun 25 14:40:20 fir-md1-s1 kernel: LustreError: 25080:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 135 previous similar messages Jun 25 14:40:32 fir-md1-s1 kernel: LNet: Service thread pid 20555 completed after 295.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 25 14:40:32 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Jun 25 14:42:38 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561498958 with bad export cookie 6746082339115562538 Jun 25 14:42:38 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 333 previous similar messages Jun 25 14:46:55 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561499215 with bad export cookie 6746082339115562538 Jun 25 14:46:55 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 559 previous similar messages Jun 25 14:48:11 fir-md1-s1 kernel: Lustre: 10149:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f339c6e1800 x1634476211139808/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:16/0 lens 480/568 e 1 to 0 dl 1561499296 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 14:48:11 fir-md1-s1 kernel: Lustre: 10149:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jun 25 14:48:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 14:48:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jun 25 14:48:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 14:48:17 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jun 25 14:48:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.18.30@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0aa8b733c0/0x5d9ee6253d84ebe9 lrc: 3/0,0 mode: PW/PW res: [0x2c002c286:0x22f9:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.18.30@o2ib6 remote: 0xb6237814b733a7c0 expref: 347 pid: 21455 timeout: 614392 lvb_type: 0 Jun 25 14:48:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 25 14:52:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f0cd2226540/0x5d9ee6253e89b2e2 lrc: 3/0,0 mode: PW/PW res: [0x200025b67:0x15cdc:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.29.8@o2ib6 remote: 0xfccaff921072149f expref: 182 pid: 97658 timeout: 614602 lvb_type: 0 Jun 25 14:52:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jun 25 14:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7b8c2334-5441-fafb-761f-7bfdc2fe1e61 (at 10.8.18.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de3da3800, cur 1561499578 expire 1561499428 last 1561499351 Jun 25 14:54:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b09d4c25-b109-b30c-132e-6a644105be34 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d4620bc00, cur 1561499675 expire 1561499525 last 1561499448 Jun 25 14:54:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 14:55:44 fir-md1-s1 kernel: LustreError: 21411:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561499654, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2b72f9bcc0/0x5d9ee625401b8723 lrc: 3/0,1 mode: --/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21411 timeout: 0 lvb_type: 0 Jun 25 14:56:00 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561499670, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e95469440/0x5d9ee6254034ca19 lrc: 3/0,1 mode: --/PW res: [0x2c002c286:0x22e6:0x0].0x0 bits 0x40/0x0 rrc: 29 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20555 timeout: 0 lvb_type: 0 Jun 25 14:56:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 133s: evicting client at 10.8.8.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f445a522400/0x5d9ee6253feda489 lrc: 3/0,0 mode: PW/PW res: [0x2c002c286:0x22e6:0x0].0x0 bits 0x40/0x0 rrc: 29 type: IBT flags: 0x60200400000020 nid: 10.8.8.22@o2ib6 remote: 0xadd31f4354a7f69f expref: 38637 pid: 97658 timeout: 614759 lvb_type: 0 Jun 25 14:56:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 25 14:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c93f154e-4163-fb6e-f3cf-dea798de7b5a (at 10.8.27.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19ae21fc00, cur 1561499850 expire 1561499700 last 1561499623 Jun 25 14:59:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 25 14:59:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jun 25 15:00:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e0767d77-866c-9038-3794-0af657e399d1 (at 10.8.8.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2055293c00, cur 1561500034 expire 1561499884 last 1561499807 Jun 25 15:00:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 15:02:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20083d7800, cur 1561500175 expire 1561500025 last 1561499948 Jun 25 15:02:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:02:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:03:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 25 15:04:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 25 15:04:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) reconnecting Jun 25 15:04:27 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jun 25 15:05:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 25 15:06:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jun 25 15:15:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:15:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 15:16:33 fir-md1-s1 kernel: Lustre: 23682:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3a78389b00 x1634476260813312/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:8/0 lens 480/568 e 1 to 0 dl 1561500998 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 15:16:33 fir-md1-s1 kernel: Lustre: 23682:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Jun 25 15:16:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 15:16:39 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 25 15:16:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 15:16:39 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jun 25 15:16:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f24243abcc0/0x5d9ee6254872031f lrc: 3/0,0 mode: PW/PW res: [0x2c0001757:0xc13:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3e6f4bca95 expref: 33 pid: 20462 timeout: 616067 lvb_type: 0 Jun 25 15:16:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 25 15:17:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520809c00, cur 1561501036 expire 1561500886 last 1561500809 Jun 25 15:18:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251348bc00, cur 1561501091 expire 1561500941 last 1561500864 Jun 25 15:20:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f161d69c400, cur 1561501250 expire 1561501100 last 1561501023 Jun 25 15:20:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 15:28:46 fir-md1-s1 kernel: Lustre: 22289:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f213edc9e00 x1636708518966384/t0(0) o101->1b90433c-235e-7531-cfe6-8ebc9f785a9b@10.9.0.64@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1561501731 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 15:28:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 25 15:28:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 25 15:28:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 15:29:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f17534018c0/0x5d9ee6254bde111d lrc: 3/0,0 mode: PW/PW res: [0x2c002bf03:0x6557:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0x33d88ec184ad2d05 expref: 210 pid: 27315 timeout: 616800 lvb_type: 0 Jun 25 15:32:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f29d0b45c00, cur 1561501968 expire 1561501818 last 1561501741 Jun 25 15:37:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 25 15:37:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:37:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:38:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 25 15:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 25 15:40:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 25 15:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 25 15:40:59 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jun 25 15:41:01 fir-md1-s1 kernel: Lustre: 97651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e68e5cb00 x1634476271081664/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:6/0 lens 480/568 e 1 to 0 dl 1561502466 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 15:41:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.102.21@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f44e4c48900/0x5d9ee6254ea4af69 lrc: 3/0,0 mode: PW/PW res: [0x2c002be60:0x9d6:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.9.102.21@o2ib4 remote: 0xa578ba1cd90dc370 expref: 338 pid: 10362 timeout: 617535 lvb_type: 0 Jun 25 15:41:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:42:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 25 15:42:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 15:47:20 fir-md1-s1 kernel: Lustre: 20213:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561502239/real 1561502239] req@ffff8f12992d7800 x1636714437752368/t0(0) o6->fir-OST0020-osc-MDT0002@10.0.10.105@o2ib7:28/4 lens 544/432 e 23 to 1 dl 1561502840 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 25 15:47:20 fir-md1-s1 kernel: Lustre: 20213:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 25 15:47:20 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0002: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jun 25 15:57:21 fir-md1-s1 kernel: Lustre: 20213:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561502840/real 1561502840] req@ffff8f12992d7800 x1636714437752368/t0(0) o6->fir-OST0020-osc-MDT0002@10.0.10.105@o2ib7:28/4 lens 544/432 e 23 to 1 dl 1561503441 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 15:57:21 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0002: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jun 25 15:57:21 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0002: Connection restored to 10.0.10.105@o2ib7 (at 10.0.10.105@o2ib7) Jun 25 15:57:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 25 16:01:02 fir-md1-s1 kernel: Lustre: 23699:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f44bb01c800 x1636708651115424/t0(0) o101->1b90433c-235e-7531-cfe6-8ebc9f785a9b@10.9.0.64@o2ib4:7/0 lens 480/568 e 1 to 0 dl 1561503667 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 16:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 25 16:01:08 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 25 16:02:17 fir-md1-s1 kernel: LustreError: 25677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561503647, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1b29e0e780/0x5d9ee62555b9bb86 lrc: 3/1,0 mode: --/PR res: [0x200021916:0x3ef:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 25677 timeout: 0 lvb_type: 0 Jun 25 16:03:16 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.105.69@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f16cabf33c0/0x5d9ee6255411ae28 lrc: 3/0,0 mode: PW/PW res: [0x200021916:0x3ef:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.105.69@o2ib4 remote: 0xe87422714ee72752 expref: 69 pid: 23715 timeout: 618856 lvb_type: 0 Jun 25 16:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 095971d4-2c15-c9c6-8336-964f67ec504b (at 10.9.105.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a27ab3400, cur 1561504045 expire 1561503895 last 1561503818 Jun 25 16:08:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 161 seconds. I think it's dead, and I am evicting it. exp ffff8f22f3f6e800, cur 1561504121 expire 1561503971 last 1561503960 Jun 25 16:09:44 fir-md1-s1 kernel: Lustre: 23555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a74ab8c00 x1636708656626272/t0(0) o101->1b90433c-235e-7531-cfe6-8ebc9f785a9b@10.9.0.64@o2ib4:19/0 lens 480/568 e 1 to 0 dl 1561504189 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 16:09:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522539800, cur 1561504187 expire 1561504037 last 1561503960 Jun 25 16:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 25 16:09:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jun 25 16:09:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.58@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f11a0ba4c80/0x5d9ee6254ed8ce2d lrc: 3/0,0 mode: PW/PW res: [0x2c002c0cb:0x5289:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.101.58@o2ib4 remote: 0xa3260ff1ba69df1c expref: 1943 pid: 23738 timeout: 619258 lvb_type: 0 Jun 25 16:10:07 fir-md1-s1 kernel: LustreError: 31007:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.58@o2ib4 arrived at 1561504207 with bad export cookie 6746082289091244866 Jun 25 16:10:07 fir-md1-s1 kernel: LustreError: 31007:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 13405 previous similar messages Jun 25 16:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f29e1e71-511a-3e98-949d-3f54561359cc (at 10.9.101.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2070f5c400, cur 1561504434 expire 1561504284 last 1561504207 Jun 25 16:13:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 16:14:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:14:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:18:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dfeb79400, cur 1561504696 expire 1561504546 last 1561504469 Jun 25 16:18:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af3fb9c00, cur 1561504715 expire 1561504565 last 1561504488 Jun 25 16:18:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 16:18:56 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f411f781800 x1634476328034336/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:1/0 lens 480/568 e 1 to 0 dl 1561504741 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 16:19:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e15f364b-b556-833b-9c7c-0e0e1407bf82 (at 10.9.0.62@o2ib4) reconnecting Jun 25 16:19:02 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 25 16:20:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jun 25 16:20:05 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 25 16:20:11 fir-md1-s1 kernel: LustreError: 23639:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561504721, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3999702400/0x5d9ee6255bf0f49a lrc: 3/1,0 mode: --/PR res: [0x2c002a161:0xded:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23639 timeout: 0 lvb_type: 0 Jun 25 16:21:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.102.21@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e8490f2c0/0x5d9ee6255bc9111b lrc: 3/0,0 mode: PW/PW res: [0x2c002a161:0xded:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.102.21@o2ib4 remote: 0xa578ba1cd90dd59f expref: 90 pid: 23573 timeout: 619930 lvb_type: 0 Jun 25 16:25:22 fir-md1-s1 kernel: Lustre: 97651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1719849e00 x1635199131755344/t0(0) o101->018b4088-9100-7f5b-2709-38dd7f461ac7@10.8.8.29@o2ib6:27/0 lens 480/568 e 1 to 0 dl 1561505127 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 16:29:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 63c454e3-b29e-031b-b57d-b0e507f25d19 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2496e04800, cur 1561505383 expire 1561505233 last 1561505156 Jun 25 16:30:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jun 25 16:30:32 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jun 25 16:34:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:34:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 16:34:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) reconnecting Jun 25 16:34:12 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jun 25 16:34:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 16:37:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1da7775c00, cur 1561505876 expire 1561505726 last 1561505649 Jun 25 16:37:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jun 25 16:39:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:40:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 25 16:40:45 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jun 25 16:41:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:41:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 16:41:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 16:43:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 16:43:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.68@o2ib6, removing former export from same NID Jun 25 16:43:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 16:44:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) reconnecting Jun 25 16:44:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jun 25 16:54:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9cf6b6d2-1109-4702-108d-d26e95bd0151 (at 10.8.14.5@o2ib6) Jun 25 16:54:30 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 25 16:58:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dfee81c00, cur 1561507093 expire 1561506943 last 1561506866 Jun 25 16:58:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 16:58:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f281d610800, cur 1561507109 expire 1561506959 last 1561506882 Jun 25 16:59:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87da5719-38f8-e25f-27bd-899baebba0f4 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e097a4400, cur 1561507194 expire 1561507044 last 1561506967 Jun 25 17:01:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 25 17:01:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:01:38 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561507291/real 1561507291] req@ffff8f1044215a00 x1636714449159776/t0(0) o104->fir-MDT0000@10.8.8.37@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561507298 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 25 17:01:45 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561507298/real 1561507298] req@ffff8f1044215a00 x1636714449159776/t0(0) o104->fir-MDT0000@10.8.8.37@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561507305 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 17:01:45 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 25 17:01:46 fir-md1-s1 kernel: Lustre: 23574:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f08486c2100 x1634476337723504/t0(0) o36->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:21/0 lens 512/2888 e 1 to 0 dl 1561507311 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 17:01:47 fir-md1-s1 kernel: Lustre: 10149:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3088a00300 x1634476337723712/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:22/0 lens 576/3264 e 1 to 0 dl 1561507312 ref 2 fl Interpret:/0/0 rc 0/0 Jun 25 17:02:00 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561507312/real 1561507312] req@ffff8f1044215a00 x1636714449159776/t0(0) o104->fir-MDT0000@10.8.8.37@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561507319 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 25 17:02:00 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jun 25 17:02:07 fir-md1-s1 kernel: LustreError: 23598:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.8.37@o2ib6) failed to reply to blocking AST (req@ffff8f1044215a00 x1636714449159776 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1b1caea880/0x5d9ee62561f4661d lrc: 4/0,0 mode: CR/CR res: [0x2000297d4:0xab9b:0x0].0x0 bits 0x9/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.8.37@o2ib6 remote: 0xb50bab6d0e7b6fcf expref: 5430 pid: 21461 timeout: 622409 lvb_type: 0 Jun 25 17:02:07 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.8.37@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 25 17:02:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.8.8.37@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1b1caea880/0x5d9ee62561f4661d lrc: 3/0,0 mode: CR/CR res: [0x2000297d4:0xab9b:0x0].0x0 bits 0x9/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.8.37@o2ib6 remote: 0xb50bab6d0e7b6fcf expref: 5431 pid: 21461 timeout: 0 lvb_type: 0 Jun 25 17:03:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e1e7d000, cur 1561507427 expire 1561507277 last 1561507200 Jun 25 17:04:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1e1769d3-ffba-a4ec-e5e5-cf0cf094a85d (at 10.8.8.37@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250bd87400, cur 1561507454 expire 1561507304 last 1561507227 Jun 25 17:04:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f140f57a000, cur 1561507513 expire 1561507363 last 1561507286 Jun 25 17:05:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:12:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 25 17:12:04 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 25 17:12:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:12:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 17:12:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 25 17:14:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) reconnecting Jun 25 17:14:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 17:15:51 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22d701d000, cur 1561508151 expire 1561508001 last 1561507924 Jun 25 17:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a6b241c00, cur 1561508214 expire 1561508064 last 1561507987 Jun 25 17:16:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:17:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:18:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9b20a7cb-a3fc-d0ca-5cea-5de703dce72f (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24a6eea800, cur 1561508285 expire 1561508135 last 1561508058 Jun 25 17:19:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 168 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3ea7c00, cur 1561508361 expire 1561508211 last 1561508193 Jun 25 17:19:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:20:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148ebd7400, cur 1561508420 expire 1561508270 last 1561508193 Jun 25 17:23:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:23:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.0.64@o2ib4, removing former export from same NID Jun 25 17:23:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 25 17:23:58 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 25 17:24:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25388da800, cur 1561508654 expire 1561508504 last 1561508427 Jun 25 17:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 25 17:24:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 17:25:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 25 17:38:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jun 25 17:42:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20dbfc4000, cur 1561509720 expire 1561509570 last 1561509493 Jun 25 17:42:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:42:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:42:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 17:46:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f2c762800, cur 1561509973 expire 1561509823 last 1561509746 Jun 25 17:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ef3be2c00, cur 1561510016 expire 1561509866 last 1561509789 Jun 25 17:46:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 17:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 25 17:48:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 25 17:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521bef800, cur 1561510328 expire 1561510178 last 1561510101 Jun 25 17:53:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 17:53:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 25 17:53:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.66@o2ib6, removing former export from same NID Jun 25 17:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) reconnecting Jun 25 17:53:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 18:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) reconnecting Jun 25 18:09:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 18:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 25 18:09:42 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jun 25 18:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a27aec800, cur 1561511492 expire 1561511342 last 1561511265 Jun 25 18:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af42b1c00, cur 1561511609 expire 1561511459 last 1561511382 Jun 25 18:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 25 18:38:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d0f4c77-c27b-6d80-d629-873de917b74e (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522978800, cur 1561513089 expire 1561512939 last 1561512862 Jun 25 18:38:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 18:38:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 111ade33-4633-d4f3-7359-6217f5551ac0 (at 10.8.14.9@o2ib6) Jun 25 18:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8ea16e6d-a041-cebf-bc4c-b2c20885e699 (at 10.8.14.9@o2ib6) in 188 seconds. I think it's dead, and I am evicting it. exp ffff8f24ee4ae400, cur 1561513165 expire 1561513015 last 1561512977 Jun 25 18:40:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8ea16e6d-a041-cebf-bc4c-b2c20885e699 (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2502d7bc00, cur 1561513206 expire 1561513056 last 1561512979 Jun 25 20:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 25 20:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 25 20:42:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 20:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) reconnecting Jun 25 20:43:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1505447400, cur 1561520622 expire 1561520472 last 1561520395 Jun 25 20:43:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 20:45:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd073587-8042-ffd0-09f1-ff79e8722875 (at 10.9.0.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2530f9e000, cur 1561520754 expire 1561520604 last 1561520527 Jun 25 20:47:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 25 20:47:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 20:47:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.63@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 25 20:51:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0846a2a000, cur 1561521064 expire 1561520914 last 1561520837 Jun 25 20:51:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 25 21:02:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jun 25 21:51:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to acd26ab4-a020-fbc0-1a40-f0e7d759131f (at 10.8.23.14@o2ib6) Jun 25 21:51:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 21:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6e7eede8-baef-e511-db4f-923a79b34ba3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f451de70800, cur 1561524697 expire 1561524547 last 1561524470 Jun 25 22:29:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c88d882f-e4f4-4b30-616b-f60f68016c23 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19e686bc00, cur 1561526996 expire 1561526846 last 1561526769 Jun 25 22:29:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 25 22:30:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to acd26ab4-a020-fbc0-1a40-f0e7d759131f (at 10.8.23.14@o2ib6) Jun 25 22:30:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 00:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 939c0635-d3e5-7945-6eca-6a92a2676304 (at 10.9.101.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4535c8ec00, cur 1561532402 expire 1561532252 last 1561532175 Jun 26 00:00:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 00:00:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 939c0635-d3e5-7945-6eca-6a92a2676304 (at 10.9.101.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1506081400, cur 1561532415 expire 1561532265 last 1561532188 Jun 26 00:00:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 26 00:01:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f017f489-eef8-cd54-4b70-e8f0166c7c7c (at 10.8.8.25@o2ib6) in 188 seconds. I think it's dead, and I am evicting it. exp ffff8f2522959000, cur 1561532478 expire 1561532328 last 1561532290 Jun 26 00:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f017f489-eef8-cd54-4b70-e8f0166c7c7c (at 10.8.8.25@o2ib6) in 201 seconds. I think it's dead, and I am evicting it. exp ffff8f3509f55400, cur 1561532491 expire 1561532341 last 1561532290 Jun 26 00:01:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 36129863-f97a-d76f-0f90-11f02517721a (at 10.8.8.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1489ff5c00, cur 1561532517 expire 1561532367 last 1561532290 Jun 26 00:02:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 26 00:02:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 01:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 967359b8-6075-fa10-8749-55133a475ab0 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520d55000, cur 1561538230 expire 1561538080 last 1561538003 Jun 26 01:54:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 26 01:54:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 01:55:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 09fe1fc8-d186-6314-b715-72bcbbf4dcb1 (at 10.8.1.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2509c65800, cur 1561539336 expire 1561539186 last 1561539109 Jun 26 01:55:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 01:59:18 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b7a06525-6fdb-7245-d004-135045c5b952 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3ab6aa5800, cur 1561539558 expire 1561539408 last 1561539331 Jun 26 01:59:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 01:59:20 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from blocking AST (req@ffff8f1bdcfdc200 x1636715389943856 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f24557dc5c0/0x5d9ee6262033f223 lrc: 4/0,0 mode: PR/PR res: [0x20002993d:0x1b0:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x76a5494a455f2a91 expref: 31 pid: 50446 timeout: 654769 lvb_type: 0 Jun 26 01:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d708e92-2967-fb68-a999-b8fb560068d3 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a682a9c00, cur 1561539560 expire 1561539410 last 1561539333 Jun 26 01:59:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 26 01:59:20 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Jun 26 01:59:20 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Jun 26 01:59:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 26 01:59:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Jun 26 01:59:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 02:12:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Jun 26 02:12:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 02:24:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b4a2e41f-34ef-236e-f48b-7a4e4b82c56e (at 10.9.101.4@o2ib4) Jun 26 02:24:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 02:25:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.8.25@o2ib6) Jun 26 02:25:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 02:29:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to be77157a-c39a-b0a3-f5b0-4e7917893782 (at 10.8.1.35@o2ib6) Jun 26 02:29:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 26 04:01:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 26 04:01:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 06:47:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 26 06:47:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 11:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0c122ba5-e660-84b0-99ae-db1f65f35f74 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e17ea9000, cur 1561574671 expire 1561574521 last 1561574444 Jun 26 11:44:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Jun 26 11:44:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 11:44:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0c122ba5-e660-84b0-99ae-db1f65f35f74 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0da9ed6400, cur 1561574681 expire 1561574531 last 1561574454 Jun 26 11:44:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 26 13:01:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jun 26 13:01:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 13:25:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 13:25:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 26 13:25:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 13:25:41 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 13:25:49 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:25:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:25:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 13:25:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 13:26:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 26 13:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) reconnecting Jun 26 13:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jun 26 18:51:16 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jun 26 18:51:16 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jun 26 20:08:10 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:08:10 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 75 previous similar messages Jun 26 20:09:08 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:09:08 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jun 26 20:15:28 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:15:28 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jun 26 20:23:56 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:23:56 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 102 previous similar messages Jun 26 20:24:06 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:24:06 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 26 20:26:43 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:26:43 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 47 previous similar messages Jun 26 20:30:13 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:30:13 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1375 previous similar messages Jun 26 20:42:00 fir-md1-s1 kernel: Lustre: 23683:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:42:00 fir-md1-s1 kernel: Lustre: 23683:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jun 26 20:42:34 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:42:34 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jun 26 20:47:21 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:47:21 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages Jun 26 20:53:40 fir-md1-s1 kernel: Lustre: 23683:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:54:27 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:54:27 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 169 previous similar messages Jun 26 20:58:15 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 20:58:15 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jun 26 21:01:54 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 21:43:00 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 26 21:43:00 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 516 previous similar messages Jun 26 21:44:21 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 26 21:44:21 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 40 previous similar messages Jun 26 21:47:00 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 26 21:47:00 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 27 previous similar messages Jun 26 22:08:33 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 26 22:08:33 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages Jun 26 22:58:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jun 26 22:58:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 00:51:42 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 00:51:42 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 47 previous similar messages Jun 27 00:55:52 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 01:03:11 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 01:03:11 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 27 01:12:03 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 01:12:03 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 27 01:20:15 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 01:20:15 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 27 01:21:01 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 27 01:31:12 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 27 01:31:12 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 27 01:55:57 fir-md1-s1 kernel: Lustre: 23632:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 27 01:57:27 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 27 03:01:01 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jun 27 03:35:12 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 27 03:35:12 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 27 03:36:33 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 27 03:36:33 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 48 previous similar messages Jun 27 03:39:24 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 27 03:39:24 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 27 03:46:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 27 03:46:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 27 04:29:20 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jun 27 11:57:14 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 11:58:30 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 11:58:30 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 50 previous similar messages Jun 27 12:01:01 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:01:01 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 112 previous similar messages Jun 27 12:06:01 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:06:01 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 226 previous similar messages Jun 27 12:16:02 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:16:02 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jun 27 12:26:03 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:26:03 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 27 12:36:03 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:36:03 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 460 previous similar messages Jun 27 12:46:04 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:46:04 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 449 previous similar messages Jun 27 12:56:05 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 12:56:05 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 457 previous similar messages Jun 27 13:06:05 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:06:05 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 457 previous similar messages Jun 27 13:16:06 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:16:06 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 27 13:26:07 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:26:07 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 27 13:36:08 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:36:08 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 454 previous similar messages Jun 27 13:46:08 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:46:08 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 450 previous similar messages Jun 27 13:56:09 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 13:56:09 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 450 previous similar messages Jun 27 14:06:10 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:06:10 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 27 14:16:10 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:16:10 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 458 previous similar messages Jun 27 14:26:12 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:26:12 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jun 27 14:36:12 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:36:12 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 27 14:46:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:46:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jun 27 14:56:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 14:56:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 451 previous similar messages Jun 27 15:06:13 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:06:13 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 447 previous similar messages Jun 27 15:16:13 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:16:13 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 451 previous similar messages Jun 27 15:26:14 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:26:14 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jun 27 15:36:15 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:36:15 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 450 previous similar messages Jun 27 15:43:50 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675423/real 1561675423] req@ffff8f10f8661200 x1636716613077104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675430 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 27 15:43:50 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jun 27 15:43:57 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675430/real 1561675430] req@ffff8f0ae2ea7b00 x1636716613077152/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675437 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:43:57 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 27 15:43:58 fir-md1-s1 kernel: Lustre: 20571:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f140aa09500 x1637002162748912/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:3/0 lens 480/568 e 1 to 0 dl 1561675443 ref 2 fl Interpret:/0/0 rc 0/0 Jun 27 15:43:58 fir-md1-s1 kernel: Lustre: 20571:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675437/real 1561675437] req@ffff8f1cbd3cf500 x1636716613077120/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675444 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:44:04 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 27 15:44:18 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675451/real 1561675451] req@ffff8f1cbd3cf500 x1636716613077120/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675458 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:44:18 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jun 27 15:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:44:39 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675472/real 1561675472] req@ffff8f10f8661200 x1636716613077104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675479 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:44:39 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jun 27 15:44:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:44:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:45:14 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675507/real 1561675507] req@ffff8f0ae2ea7b00 x1636716613077152/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675514 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:45:14 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Jun 27 15:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:46:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jun 27 15:46:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 15:46:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jun 27 15:46:15 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:46:15 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 428 previous similar messages Jun 27 15:46:24 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561675577/real 1561675577] req@ffff8f0ae2ea7b00 x1636716613077152/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561675584 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 27 15:46:24 fir-md1-s1 kernel: Lustre: 23701:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages Jun 27 15:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b09d4c25-b109-b30c-132e-6a644105be34 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ea696400, cur 1561675597 expire 1561675447 last 1561675370 Jun 27 15:46:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b09d4c25-b109-b30c-132e-6a644105be34 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fa790b000, cur 1561675612 expire 1561675462 last 1561675385 Jun 27 15:46:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 15:46:52 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (188:1s); client may timeout. req@ffff8f0cc4698900 x1637002162748944/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:3/0 lens 480/536 e 1 to 0 dl 1561675611 ref 1 fl Complete:/0/0 rc 301/301 Jun 27 15:46:52 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1453 previous similar messages Jun 27 15:48:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 27 15:48:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 15:50:54 fir-md1-s1 kernel: LustreError: 21460:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2d39ed9500 x1636716686208928/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 27 15:50:54 fir-md1-s1 kernel: LustreError: 21460:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Jun 27 15:50:56 fir-md1-s1 kernel: LustreError: 21446:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2d1fe59500 x1636716686210688/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 27 15:50:56 fir-md1-s1 kernel: LustreError: 21446:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jun 27 15:51:09 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19ad73a100 x1636512232467120/t0(0) o101->3429bec6-fe2a-19ec-4f0c-bb576fed4ff4@10.8.29.4@o2ib6:14/0 lens 480/568 e 1 to 0 dl 1561675874 ref 2 fl Interpret:/0/0 rc 0/0 Jun 27 15:51:09 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jun 27 15:51:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3429bec6-fe2a-19ec-4f0c-bb576fed4ff4 (at 10.8.29.4@o2ib6) reconnecting Jun 27 15:51:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 27 15:51:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0ed884ea-fa51-544e-85e4-1d3a8c288fe4 (at 10.8.29.4@o2ib6) Jun 27 15:51:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 27 15:51:35 fir-md1-s1 kernel: LustreError: 97644:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f169db1ef00 x1636716686243552/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 27 15:51:35 fir-md1-s1 kernel: LustreError: 97644:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Jun 27 15:52:24 fir-md1-s1 kernel: LustreError: 97670:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561675854, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1ab63c4a40/0x5d9ee62884870864 lrc: 3/0,1 mode: --/PW res: [0x2000222aa:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97670 timeout: 0 lvb_type: 0 Jun 27 15:52:24 fir-md1-s1 kernel: LustreError: 97670:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jun 27 15:52:36 fir-md1-s1 kernel: Lustre: 23578:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0c54094b00 x1637002172029856/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:11/0 lens 592/3264 e 0 to 0 dl 1561675961 ref 2 fl Interpret:/0/0 rc 0/0 Jun 27 15:52:36 fir-md1-s1 kernel: Lustre: 23578:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 27 15:53:05 fir-md1-s1 kernel: LustreError: 97644:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561675895, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f22e074ec00/0x5d9ee62884a9d9f6 lrc: 3/0,1 mode: --/EX res: [0x200029c2e:0x62:0x0].0x0 bits 0x21/0x0 rrc: 5 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97644 timeout: 0 lvb_type: 0 Jun 27 15:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Jun 27 15:53:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f340ef8f980/0x5d9ee6287f84cd5d lrc: 3/0,0 mode: PR/PR res: [0x2000222aa:0x10f:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0xe063eaa9a5d7f7c1 expref: 90843 pid: 97666 timeout: 791063 lvb_type: 0 Jun 27 15:53:39 fir-md1-s1 kernel: LustreError: 97644:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1d48b63c00 x1636716686375312/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 27 15:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3429bec6-fe2a-19ec-4f0c-bb576fed4ff4 (at 10.8.29.4@o2ib6) reconnecting Jun 27 15:53:42 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jun 27 15:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0ed884ea-fa51-544e-85e4-1d3a8c288fe4 (at 10.8.29.4@o2ib6) Jun 27 15:53:42 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 27 15:54:14 fir-md1-s1 kernel: LNet: Service thread pid 97670 was inactive for 200.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 27 15:54:14 fir-md1-s1 kernel: Pid: 97670, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 27 15:54:14 fir-md1-s1 kernel: Call Trace: Jun 27 15:54:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 27 15:54:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 27 15:54:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 27 15:54:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 27 15:54:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 27 15:54:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561676054.97670 Jun 27 15:54:22 fir-md1-s1 kernel: LNet: Service thread pid 97670 completed after 208.25s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 27 15:56:16 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 15:56:16 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jun 27 16:06:16 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:06:16 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 27 16:16:18 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:16:18 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 27 16:26:19 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:26:19 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 467 previous similar messages Jun 27 16:36:21 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:36:21 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 27 16:46:21 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:46:21 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 27 16:56:21 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 16:56:21 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 27 17:06:22 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:06:22 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 466 previous similar messages Jun 27 17:16:23 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:16:23 fir-md1-s1 kernel: LustreError: 22648:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 27 17:26:23 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:26:23 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 27 17:36:24 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:36:24 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 462 previous similar messages Jun 27 17:46:26 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:46:26 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 454 previous similar messages Jun 27 17:56:26 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 17:56:26 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 27 18:06:27 fir-md1-s1 kernel: LustreError: 46529:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:06:27 fir-md1-s1 kernel: LustreError: 46529:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 431 previous similar messages Jun 27 18:16:27 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:16:27 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jun 27 18:26:28 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:26:28 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 397 previous similar messages Jun 27 18:36:29 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:36:29 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 396 previous similar messages Jun 27 18:46:29 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:46:29 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 395 previous similar messages Jun 27 18:48:54 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jun 27 18:56:29 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 18:56:29 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jun 27 19:06:31 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:06:31 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 395 previous similar messages Jun 27 19:16:32 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:16:32 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 393 previous similar messages Jun 27 19:26:32 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:26:32 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 430 previous similar messages Jun 27 19:36:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:36:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 433 previous similar messages Jun 27 19:46:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:46:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 408 previous similar messages Jun 27 19:56:34 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 19:56:34 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 27 20:06:35 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:06:35 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 454 previous similar messages Jun 27 20:16:36 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:16:36 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 27 20:26:37 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:26:37 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 27 20:36:38 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:36:38 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 457 previous similar messages Jun 27 20:46:38 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:46:38 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jun 27 20:56:38 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 20:56:38 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 27 21:06:39 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:06:39 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 458 previous similar messages Jun 27 21:16:39 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:16:39 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 454 previous similar messages Jun 27 21:26:40 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:26:40 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 27 21:36:41 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:36:41 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 27 21:46:42 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:46:42 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 27 21:56:43 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 21:56:43 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 27 22:06:44 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:06:44 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jun 27 22:16:44 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:16:44 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 27 22:26:46 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:26:46 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 467 previous similar messages Jun 27 22:36:47 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:36:47 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 444 previous similar messages Jun 27 22:46:48 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:46:48 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jun 27 22:56:48 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 22:56:48 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 489 previous similar messages Jun 27 23:06:49 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:06:49 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 498 previous similar messages Jun 27 23:16:50 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:16:50 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 495 previous similar messages Jun 27 23:26:50 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:26:50 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 499 previous similar messages Jun 27 23:36:51 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:36:51 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 509 previous similar messages Jun 27 23:46:52 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:46:52 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jun 27 23:56:52 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 27 23:56:52 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 507 previous similar messages Jun 28 00:06:53 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:06:53 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 466 previous similar messages Jun 28 00:16:54 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:16:54 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 474 previous similar messages Jun 28 00:26:55 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:26:55 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 28 00:36:55 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:36:55 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 509 previous similar messages Jun 28 00:46:55 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:46:55 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 533 previous similar messages Jun 28 00:56:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 00:56:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 527 previous similar messages Jun 28 01:06:56 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:06:56 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jun 28 01:16:57 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:16:57 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 529 previous similar messages Jun 28 01:26:58 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:26:58 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 549 previous similar messages Jun 28 01:36:58 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:36:58 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 543 previous similar messages Jun 28 01:46:58 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:46:58 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 541 previous similar messages Jun 28 01:56:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 01:56:58 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 540 previous similar messages Jun 28 02:07:00 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:07:00 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 28 02:17:01 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:17:01 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 28 02:27:02 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:27:02 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 398 previous similar messages Jun 28 02:37:03 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:37:03 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 28 02:47:03 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:47:03 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 28 02:57:04 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 02:57:04 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 398 previous similar messages Jun 28 03:07:04 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:07:04 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 28 03:17:05 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:17:05 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 28 03:27:06 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:27:06 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 28 03:37:07 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:37:07 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 399 previous similar messages Jun 28 03:47:08 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:47:08 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 396 previous similar messages Jun 28 03:57:09 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 03:57:09 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 387 previous similar messages Jun 28 04:07:10 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:07:10 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 365 previous similar messages Jun 28 04:17:11 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:17:11 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 378 previous similar messages Jun 28 04:27:12 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:27:12 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 387 previous similar messages Jun 28 04:37:12 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:37:12 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 378 previous similar messages Jun 28 04:47:12 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:47:12 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 392 previous similar messages Jun 28 04:57:13 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 04:57:13 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 28 05:07:14 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:07:14 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 28 05:17:14 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:17:14 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 387 previous similar messages Jun 28 05:27:15 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:27:15 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 373 previous similar messages Jun 28 05:37:16 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:37:16 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 387 previous similar messages Jun 28 05:47:17 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:47:17 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 378 previous similar messages Jun 28 05:57:17 fir-md1-s1 kernel: LustreError: 21717:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 05:57:17 fir-md1-s1 kernel: LustreError: 21717:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 386 previous similar messages Jun 28 06:04:46 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561727079/real 1561727079] req@ffff8f10fc24dd00 x1636717505315616/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561727086 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 28 06:04:46 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jun 28 06:04:54 fir-md1-s1 kernel: Lustre: 10506:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ad0e0ec00 x1637002685958656/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1561727099 ref 2 fl Interpret:/0/0 rc 0/0 Jun 28 06:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6) reconnecting Jun 28 06:05:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 28 06:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 28 06:05:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 28 06:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6) reconnecting Jun 28 06:05:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 28 06:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 28 06:05:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 28 06:07:19 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:07:19 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jun 28 06:17:19 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:17:19 fir-md1-s1 kernel: LustreError: 25633:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 376 previous similar messages Jun 28 06:27:20 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:27:20 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 381 previous similar messages Jun 28 06:37:22 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:37:22 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 370 previous similar messages Jun 28 06:47:22 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:47:22 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 06:57:23 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 06:57:23 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 372 previous similar messages Jun 28 07:07:24 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:07:24 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 390 previous similar messages Jun 28 07:17:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:17:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jun 28 07:27:25 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:27:25 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 388 previous similar messages Jun 28 07:37:25 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:37:25 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 07:47:26 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:47:26 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 07:57:28 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 07:57:28 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 08:07:29 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:07:29 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 28 08:17:30 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:17:30 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 08:27:31 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:27:31 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 08:37:32 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:37:32 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 08:47:33 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:47:33 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 08:57:33 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 08:57:33 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 28 09:07:34 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:07:34 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 09:17:35 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:17:35 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 383 previous similar messages Jun 28 09:27:36 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:27:36 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 28 09:37:37 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:37:37 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 09:47:37 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:47:37 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jun 28 09:57:37 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 09:57:37 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 28 10:07:37 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:07:37 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 380 previous similar messages Jun 28 10:17:39 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:17:39 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 367 previous similar messages Jun 28 10:27:40 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:27:40 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 10:33:39 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jun 28 10:37:41 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:37:41 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jun 28 10:47:42 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:47:42 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 383 previous similar messages Jun 28 10:57:42 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 10:57:42 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 380 previous similar messages Jun 28 11:07:43 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:07:43 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 379 previous similar messages Jun 28 11:17:44 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:17:44 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 377 previous similar messages Jun 28 11:27:45 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:27:45 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 368 previous similar messages Jun 28 11:37:46 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:37:46 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 374 previous similar messages Jun 28 11:47:47 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:47:47 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 381 previous similar messages Jun 28 11:57:48 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 11:57:48 fir-md1-s1 kernel: LustreError: 57787:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 382 previous similar messages Jun 28 12:07:49 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:07:49 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 428 previous similar messages Jun 28 12:17:50 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:17:50 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 491 previous similar messages Jun 28 12:27:50 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:27:50 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 500 previous similar messages Jun 28 12:37:50 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:37:50 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 498 previous similar messages Jun 28 12:47:51 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:47:51 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 494 previous similar messages Jun 28 12:57:52 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 12:57:52 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 490 previous similar messages Jun 28 13:07:52 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:07:52 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 499 previous similar messages Jun 28 13:17:52 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:17:52 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 501 previous similar messages Jun 28 13:27:53 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:27:53 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 503 previous similar messages Jun 28 13:31:01 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jun 28 13:37:53 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:37:53 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 501 previous similar messages Jun 28 13:47:54 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:47:54 fir-md1-s1 kernel: LustreError: 22226:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jun 28 13:50:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 52b5dc52-8a4c-f64c-7b51-91709e30d8ba (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f326b3d1800, cur 1561755001 expire 1561754851 last 1561754774 Jun 28 13:50:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 52b5dc52-8a4c-f64c-7b51-91709e30d8ba (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f121c50b000, cur 1561755008 expire 1561754858 last 1561754781 Jun 28 13:52:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Jun 28 13:57:55 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 13:57:55 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 499 previous similar messages Jun 28 14:07:56 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:07:56 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 487 previous similar messages Jun 28 14:17:56 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:17:56 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 503 previous similar messages Jun 28 14:27:57 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:27:57 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 500 previous similar messages Jun 28 14:37:57 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:37:57 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jun 28 14:47:58 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:47:58 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jun 28 14:57:59 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 14:57:59 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 504 previous similar messages Jun 28 15:07:59 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:07:59 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 504 previous similar messages Jun 28 15:18:00 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:18:00 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 499 previous similar messages Jun 28 15:28:00 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:28:00 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 506 previous similar messages Jun 28 15:38:00 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:38:00 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 499 previous similar messages Jun 28 15:48:00 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:48:00 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 472 previous similar messages Jun 28 15:58:01 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 15:58:01 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 527 previous similar messages Jun 28 16:08:01 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:08:01 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 516 previous similar messages Jun 28 16:13:22 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 28 16:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 172b52dd-0b9e-12f8-c21f-947aedff05a0 (at 10.8.18.2@o2ib6) reconnecting Jun 28 16:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2727f5d4-463f-2044-b04c-92df44e40c7d (at 10.8.18.2@o2ib6) Jun 28 16:13:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 28 16:14:35 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 28 16:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7ab2f51d-a689-9f2c-be74-3bf003bf5840 (at 10.8.0.66@o2ib6) reconnecting Jun 28 16:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jun 28 16:18:01 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:18:01 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 538 previous similar messages Jun 28 16:28:02 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:28:02 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 536 previous similar messages Jun 28 16:38:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:38:02 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 536 previous similar messages Jun 28 16:48:03 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:48:03 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 534 previous similar messages Jun 28 16:58:04 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 16:58:04 fir-md1-s1 kernel: LustreError: 46590:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 527 previous similar messages Jun 28 17:08:05 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:08:05 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 532 previous similar messages Jun 28 17:18:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:18:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 528 previous similar messages Jun 28 17:28:07 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:28:07 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 535 previous similar messages Jun 28 17:38:07 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:38:07 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jun 28 17:48:08 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:48:08 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 515 previous similar messages Jun 28 17:58:09 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 17:58:09 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 527 previous similar messages Jun 28 18:03:03 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 28 18:03:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6660433e-6178-3b9d-5600-564c37c5d5bd (at 10.8.8.26@o2ib6) reconnecting Jun 28 18:03:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4bf93a7c-5f27-067e-124f-bc871b3eff21 (at 10.8.8.26@o2ib6) Jun 28 18:08:09 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:08:09 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 28 18:18:10 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:18:10 fir-md1-s1 kernel: LustreError: 21291:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 410 previous similar messages Jun 28 18:28:10 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:28:10 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 28 18:38:11 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:38:11 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 410 previous similar messages Jun 28 18:48:12 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:48:12 fir-md1-s1 kernel: LustreError: 46561:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 28 18:58:13 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 18:58:13 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 411 previous similar messages Jun 28 19:08:13 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:08:13 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 405 previous similar messages Jun 28 19:18:14 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:18:14 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 28 19:28:14 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:28:14 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 28 19:38:14 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:38:14 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 408 previous similar messages Jun 28 19:48:15 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:48:15 fir-md1-s1 kernel: LustreError: 22157:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 405 previous similar messages Jun 28 19:58:16 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 19:58:16 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 413 previous similar messages Jun 28 20:08:17 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:08:17 fir-md1-s1 kernel: LustreError: 21390:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 400 previous similar messages Jun 28 20:18:17 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:18:17 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 400 previous similar messages Jun 28 20:28:19 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:28:19 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 408 previous similar messages Jun 28 20:38:19 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:38:19 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 28 20:48:20 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:48:20 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 28 20:58:21 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 20:58:21 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 406 previous similar messages Jun 28 21:08:21 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:08:21 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jun 28 21:18:22 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:18:22 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 368 previous similar messages Jun 28 21:28:24 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:28:24 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 28 21:38:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:38:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 409 previous similar messages Jun 28 21:48:24 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:48:24 fir-md1-s1 kernel: LustreError: 46575:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 28 21:58:24 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 21:58:24 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 454 previous similar messages Jun 28 22:08:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:08:24 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 476 previous similar messages Jun 28 22:18:25 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:18:25 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 490 previous similar messages Jun 28 22:28:26 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:28:26 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 507 previous similar messages Jun 28 22:38:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:38:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 522 previous similar messages Jun 28 22:48:27 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:48:27 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 515 previous similar messages Jun 28 22:53:52 fir-md1-s1 kernel: Lustre: 23650:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jun 28 22:58:27 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 22:58:27 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 516 previous similar messages Jun 28 23:08:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:08:29 fir-md1-s1 kernel: LustreError: 21542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 513 previous similar messages Jun 28 23:18:29 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:18:29 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 517 previous similar messages Jun 28 23:28:29 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:28:29 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 519 previous similar messages Jun 28 23:38:30 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:38:30 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 28 23:48:30 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:48:30 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 28 23:58:32 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 28 23:58:32 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 470 previous similar messages Jun 29 00:08:33 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:08:33 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 472 previous similar messages Jun 29 00:18:33 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:18:33 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 473 previous similar messages Jun 29 00:28:34 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:28:34 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 472 previous similar messages Jun 29 00:38:34 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:38:34 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 473 previous similar messages Jun 29 00:48:35 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:48:35 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 29 00:58:36 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 00:58:36 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 476 previous similar messages Jun 29 01:08:36 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:08:36 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 477 previous similar messages Jun 29 01:18:37 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:18:37 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 469 previous similar messages Jun 29 01:28:37 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:28:37 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 29 01:38:38 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:38:38 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 476 previous similar messages Jun 29 01:48:39 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:48:39 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 29 01:58:40 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 01:58:40 fir-md1-s1 kernel: LustreError: 25997:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 481 previous similar messages Jun 29 02:08:42 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:08:42 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 483 previous similar messages Jun 29 02:18:42 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:18:42 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 485 previous similar messages Jun 29 02:28:43 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:28:43 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 483 previous similar messages Jun 29 02:38:44 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:38:44 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 29 02:48:44 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:48:44 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 481 previous similar messages Jun 29 02:58:45 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 02:58:45 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 29 03:08:46 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:08:46 fir-md1-s1 kernel: LustreError: 21712:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 482 previous similar messages Jun 29 03:18:46 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:18:46 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 478 previous similar messages Jun 29 03:28:47 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:28:47 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 479 previous similar messages Jun 29 03:38:48 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:38:48 fir-md1-s1 kernel: LustreError: 22156:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 427 previous similar messages Jun 29 03:42:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d4f346d-e38b-6c6e-266e-3da4c47c24e4 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24eff97c00, cur 1561804943 expire 1561804793 last 1561804716 Jun 29 03:42:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 29 03:43:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jun 29 03:48:48 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:48:48 fir-md1-s1 kernel: LustreError: 22427:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 455 previous similar messages Jun 29 03:58:50 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 03:58:50 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 04:08:50 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:08:50 fir-md1-s1 kernel: LustreError: 21545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 29 04:15:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d5062c59-a286-2049-232b-def850bbc374 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a7ce4a000, cur 1561806941 expire 1561806791 last 1561806714 Jun 29 04:15:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 04:16:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jun 29 04:16:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 04:18:50 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:18:50 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 472 previous similar messages Jun 29 04:28:51 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:28:51 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 04:38:52 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:38:52 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 466 previous similar messages Jun 29 04:48:52 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:48:52 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 460 previous similar messages Jun 29 04:56:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae08e35a-e0d0-58b5-17ae-e4363256cb18 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bdc869800, cur 1561809404 expire 1561809254 last 1561809177 Jun 29 04:56:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 04:57:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jun 29 04:57:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 04:58:53 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 04:58:53 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 29 05:08:54 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:08:54 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jun 29 05:18:55 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:18:55 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 05:28:55 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:28:55 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 433 previous similar messages Jun 29 05:38:56 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:38:56 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 05:48:57 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:48:57 fir-md1-s1 kernel: LustreError: 25634:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jun 29 05:58:58 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 05:58:58 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 06:08:59 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:08:59 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 06:18:59 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:18:59 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jun 29 06:28:59 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:28:59 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 29 06:39:00 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:39:00 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 29 06:49:01 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:49:01 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 459 previous similar messages Jun 29 06:59:01 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 06:59:01 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 469 previous similar messages Jun 29 07:09:02 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:09:02 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 29 07:19:02 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:19:02 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 07:29:02 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:29:02 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 07:39:03 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:39:03 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 466 previous similar messages Jun 29 07:49:04 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:49:04 fir-md1-s1 kernel: LustreError: 25972:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 07:59:05 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 07:59:05 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 08:09:05 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:09:05 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 440 previous similar messages Jun 29 08:19:06 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:19:06 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 458 previous similar messages Jun 29 08:29:06 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:29:06 fir-md1-s1 kernel: LustreError: 21713:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 469 previous similar messages Jun 29 08:39:07 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:39:07 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 462 previous similar messages Jun 29 08:46:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jun 29 08:46:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 08:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 82c7213a-dc0a-1b63-00e6-606d680853e3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a06e57c00, cur 1561823196 expire 1561823046 last 1561822969 Jun 29 08:46:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 08:49:08 fir-md1-s1 kernel: LustreError: 46552:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:49:08 fir-md1-s1 kernel: LustreError: 46552:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 458 previous similar messages Jun 29 08:59:09 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 08:59:09 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 519 previous similar messages Jun 29 09:09:10 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 09:09:10 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 543 previous similar messages Jun 29 09:19:10 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 29 09:19:10 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 564 previous similar messages Jun 29 09:29:10 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 09:29:10 fir-md1-s1 kernel: LustreError: 46593:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 601 previous similar messages Jun 29 09:39:10 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 09:39:10 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 696 previous similar messages Jun 29 09:49:12 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 09:49:12 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 665 previous similar messages Jun 29 09:59:12 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 09:59:12 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 651 previous similar messages Jun 29 10:09:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:09:13 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 646 previous similar messages Jun 29 10:14:58 fir-md1-s1 kernel: sched: RT throttling activated Jun 29 10:19:14 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:19:14 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 600 previous similar messages Jun 29 10:29:14 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:29:14 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 546 previous similar messages Jun 29 10:39:14 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:39:14 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jun 29 10:49:16 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:49:16 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 555 previous similar messages Jun 29 10:59:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 10:59:17 fir-md1-s1 kernel: LustreError: 21540:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 545 previous similar messages Jun 29 11:09:17 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:09:17 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 493 previous similar messages Jun 29 11:19:17 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:19:17 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 462 previous similar messages Jun 29 11:29:18 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:29:18 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 460 previous similar messages Jun 29 11:39:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:39:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 442 previous similar messages Jun 29 11:49:19 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:49:19 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 445 previous similar messages Jun 29 11:59:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 11:59:20 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jun 29 12:09:20 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:09:20 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 12:19:21 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:19:21 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jun 29 12:29:22 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:29:22 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 457 previous similar messages Jun 29 12:39:23 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:39:23 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 470 previous similar messages Jun 29 12:49:24 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:49:24 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 474 previous similar messages Jun 29 12:59:24 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 12:59:24 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 29 13:09:24 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:09:24 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 472 previous similar messages Jun 29 13:19:25 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:19:25 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 13:29:25 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:29:25 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 13:39:25 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:39:25 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 485 previous similar messages Jun 29 13:49:26 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:49:26 fir-md1-s1 kernel: LustreError: 46549:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 494 previous similar messages Jun 29 13:51:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9991ec5a-a329-b00c-1b36-c9ef203c13d2 (at 10.8.1.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250077fc00, cur 1561841469 expire 1561841319 last 1561841242 Jun 29 13:51:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 13:51:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jun 29 13:51:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 13:52:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d5a8de86-e2c0-2c49-971c-021289d53cbe (at 10.8.1.31@o2ib6) Jun 29 13:52:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 13:59:28 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 13:59:28 fir-md1-s1 kernel: LustreError: 46533:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 431 previous similar messages Jun 29 14:09:28 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:09:28 fir-md1-s1 kernel: LustreError: 46578:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 29 14:19:28 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:19:28 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 401 previous similar messages Jun 29 14:29:29 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:29:29 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 29 14:39:29 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:39:29 fir-md1-s1 kernel: LustreError: 27603:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jun 29 14:43:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3a81a303-95c2-a3aa-25be-f3ca1eccf64d (at 10.9.104.31@o2ib4) Jun 29 14:43:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 14:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 18cb183b-1663-4392-4f25-4d4a8c1aacaa (at 10.9.104.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4519081800, cur 1561844648 expire 1561844498 last 1561844421 Jun 29 14:44:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 14:49:31 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:49:31 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 29 14:59:32 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 14:59:32 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jun 29 15:09:32 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:09:32 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 407 previous similar messages Jun 29 15:19:33 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:19:33 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 29 15:29:34 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:29:34 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 396 previous similar messages Jun 29 15:39:35 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:39:35 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 436 previous similar messages Jun 29 15:49:35 fir-md1-s1 kernel: LustreError: 44038:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:49:35 fir-md1-s1 kernel: LustreError: 44038:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 29 15:59:35 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 15:59:35 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 412 previous similar messages Jun 29 16:09:37 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:09:37 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 414 previous similar messages Jun 29 16:19:37 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:19:37 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 439 previous similar messages Jun 29 16:29:38 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:29:38 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 446 previous similar messages Jun 29 16:39:39 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:39:39 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 29 16:49:40 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:49:40 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 29 16:59:41 fir-md1-s1 kernel: LustreError: 46552:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 16:59:41 fir-md1-s1 kernel: LustreError: 46552:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 446 previous similar messages Jun 29 17:09:41 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:09:41 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jun 29 17:19:42 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:19:42 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 474 previous similar messages Jun 29 17:29:43 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:29:43 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 475 previous similar messages Jun 29 17:39:43 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:39:43 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 477 previous similar messages Jun 29 17:49:43 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:49:43 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 17:59:43 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 17:59:43 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 18:09:44 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:09:44 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 474 previous similar messages Jun 29 18:19:44 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:19:44 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 470 previous similar messages Jun 29 18:29:46 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:29:46 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 18:39:46 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:39:46 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 18:49:46 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:49:46 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jun 29 18:59:46 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 18:59:46 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 467 previous similar messages Jun 29 19:09:47 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:09:47 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 436 previous similar messages Jun 29 19:19:47 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:19:47 fir-md1-s1 kernel: LustreError: 25632:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 440 previous similar messages Jun 29 19:29:48 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:29:48 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 418 previous similar messages Jun 29 19:39:50 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:39:50 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 383 previous similar messages Jun 29 19:49:50 fir-md1-s1 kernel: LustreError: 46563:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:49:50 fir-md1-s1 kernel: LustreError: 46563:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 416 previous similar messages Jun 29 19:59:51 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 19:59:51 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 390 previous similar messages Jun 29 20:09:52 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:09:52 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 371 previous similar messages Jun 29 20:19:53 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:19:53 fir-md1-s1 kernel: LustreError: 21294:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 372 previous similar messages Jun 29 20:29:54 fir-md1-s1 kernel: LustreError: 44040:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:29:54 fir-md1-s1 kernel: LustreError: 44040:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 397 previous similar messages Jun 29 20:39:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:39:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 425 previous similar messages Jun 29 20:49:55 fir-md1-s1 kernel: LustreError: 46564:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:49:55 fir-md1-s1 kernel: LustreError: 46564:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 440 previous similar messages Jun 29 20:59:56 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 20:59:56 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 485 previous similar messages Jun 29 21:09:57 fir-md1-s1 kernel: LustreError: 46564:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 21:09:57 fir-md1-s1 kernel: LustreError: 46564:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 480 previous similar messages Jun 29 21:14:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d9a5b23f-bd1d-b214-10be-ab41be0e273e (at 10.9.108.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252426c000, cur 1561868065 expire 1561867915 last 1561867838 Jun 29 21:14:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 29 21:14:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5c0904b9-a746-baa3-6518-92bf7219376b (at 10.9.108.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533274c00, cur 1561868067 expire 1561867917 last 1561867840 Jun 29 21:19:57 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 21:19:57 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jun 29 21:29:59 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 21:29:59 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 449 previous similar messages Jun 29 21:40:00 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 21:40:00 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 440 previous similar messages Jun 29 21:50:00 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 21:50:00 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 442 previous similar messages Jun 29 22:00:00 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:00:00 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 431 previous similar messages Jun 29 22:10:01 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:10:01 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 419 previous similar messages Jun 29 22:20:01 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:20:01 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 29 22:30:01 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:30:01 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 446 previous similar messages Jun 29 22:40:02 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:40:02 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jun 29 22:50:02 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 22:50:02 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 453 previous similar messages Jun 29 23:00:03 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:00:03 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 436 previous similar messages Jun 29 23:10:03 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:10:03 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 29 23:20:04 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:20:04 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 435 previous similar messages Jun 29 23:30:04 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:30:04 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 438 previous similar messages Jun 29 23:39:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7effe21a-1be3-b078-9c02-424c4e1d26a9 (at 10.9.106.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521a47000, cur 1561876792 expire 1561876642 last 1561876565 Jun 29 23:39:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 29 23:40:04 fir-md1-s1 kernel: LustreError: 44040:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:40:04 fir-md1-s1 kernel: LustreError: 44040:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 478 previous similar messages Jun 29 23:50:05 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jun 29 23:50:05 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 441 previous similar messages Jun 30 00:42:50 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 30 00:42:50 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 30 00:44:15 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 00:44:15 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 00:46:48 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 00:46:48 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 6 previous similar messages Jun 30 00:51:52 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 00:51:52 fir-md1-s1 kernel: LustreError: 21497:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 11 previous similar messages Jun 30 01:11:01 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 30 01:11:01 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jun 30 01:21:16 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 01:21:16 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 25 previous similar messages Jun 30 01:39:19 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 01:39:19 fir-md1-s1 kernel: LustreError: 27585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 30 01:49:20 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 30 01:49:20 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 41 previous similar messages Jun 30 02:06:37 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jun 30 02:06:37 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jun 30 02:23:16 fir-md1-s1 kernel: LustreError: 21744:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jun 30 02:23:16 fir-md1-s1 kernel: LustreError: 21744:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 69 previous similar messages Jun 30 02:33:35 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 02:33:35 fir-md1-s1 kernel: LustreError: 21516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 236 previous similar messages Jun 30 02:43:38 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 02:43:38 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 186 previous similar messages Jun 30 03:00:12 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jun 30 03:00:12 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31 previous similar messages Jun 30 03:10:21 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 03:10:21 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 178 previous similar messages Jun 30 03:20:26 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 03:20:26 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 33 previous similar messages Jun 30 03:48:55 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 03:48:55 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jun 30 03:52:10 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 03:52:10 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 04:10:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 04:13:48 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 04:13:48 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 04:31:56 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 04:32:21 fir-md1-s1 kernel: LustreError: 46528:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 05:42:43 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jun 30 05:42:43 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jun 30 05:42:44 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 05:42:44 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31 previous similar messages Jun 30 05:42:47 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 05:42:47 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jun 30 05:43:12 fir-md1-s1 kernel: LustreError: 46579:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 05:43:12 fir-md1-s1 kernel: LustreError: 46579:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 30 05:43:22 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 05:43:22 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 26 previous similar messages Jun 30 05:43:41 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jun 30 05:43:41 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 26 previous similar messages Jun 30 05:44:20 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 05:44:20 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 179 previous similar messages Jun 30 05:45:50 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jun 30 05:45:50 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 40 previous similar messages Jun 30 05:49:18 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jun 30 05:49:18 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 65 previous similar messages Jun 30 05:54:47 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 155648 GRANT, real grant 0 Jun 30 05:54:47 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 39 previous similar messages Jun 30 06:06:11 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jun 30 06:06:11 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 373 previous similar messages Jun 30 06:16:14 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 06:16:14 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jun 30 06:29:04 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 06:29:04 fir-md1-s1 kernel: LustreError: 21565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jun 30 06:39:31 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 06:39:31 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jun 30 06:49:34 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 06:49:34 fir-md1-s1 kernel: LustreError: 27602:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jun 30 07:00:03 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 07:00:03 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 130 previous similar messages Jun 30 07:10:07 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 07:10:07 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 95 previous similar messages Jun 30 07:22:13 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 07:22:13 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 39 previous similar messages Jun 30 07:35:01 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 07:45:41 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 07:45:41 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 59 previous similar messages Jun 30 07:56:34 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 07:56:34 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 30 08:11:33 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 08:11:33 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 30 08:30:56 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 08:30:56 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 08:44:10 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 08:44:10 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 08:56:08 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 08:56:08 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 6 previous similar messages Jun 30 09:11:21 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 09:11:21 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 09:36:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 09:36:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 30 09:41:52 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 09:41:52 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 09:46:23 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 09:46:23 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 10:01:28 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 10:16:25 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 10:16:25 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jun 30 10:29:20 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 10:46:25 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 10:46:25 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 30 11:11:12 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 11:11:12 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 11:16:34 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 11:16:34 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 11:22:16 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 11:22:16 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jun 30 11:31:27 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 11:31:27 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jun 30 11:58:12 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 11:58:12 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 63 previous similar messages Jun 30 12:10:10 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 12:10:10 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 12:10:31 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 155648 GRANT, real grant 0 Jun 30 12:23:14 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:23:14 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jun 30 12:23:19 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:23:19 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jun 30 12:23:24 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:23:24 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jun 30 12:23:58 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 12:23:58 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jun 30 12:24:19 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:24:19 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jun 30 12:24:59 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:24:59 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jun 30 12:26:14 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 12:26:14 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 40 previous similar messages Jun 30 12:28:54 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 32768 GRANT, real grant 0 Jun 30 12:28:54 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 93 previous similar messages Jun 30 12:34:15 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 12:34:15 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 122 previous similar messages Jun 30 13:52:28 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 13:52:28 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jun 30 13:53:43 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 13:53:43 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jun 30 13:56:24 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 13:56:24 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jun 30 14:01:40 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 14:01:40 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 77 previous similar messages Jun 30 14:11:54 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 14:11:54 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 94 previous similar messages Jun 30 14:22:44 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 32768 GRANT, real grant 0 Jun 30 14:22:44 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 29 previous similar messages Jun 30 14:32:49 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 14:32:49 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 29 previous similar messages Jun 30 14:42:51 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 14:42:51 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 40 previous similar messages Jun 30 14:53:10 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 14:53:10 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 74 previous similar messages Jun 30 15:03:19 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 15:03:19 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 49 previous similar messages Jun 30 15:13:58 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 79e647da-c3f6-a3be-d8fe-44afe2c61e65 claims 28672 GRANT, real grant 0 Jun 30 15:13:58 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 37 previous similar messages Jun 30 15:24:07 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 5f763aae-b29d-37fb-3cb6-92e44ca397c9 claims 28672 GRANT, real grant 0 Jun 30 15:24:07 fir-md1-s1 kernel: LustreError: 22730:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 30 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cf169c800, cur 1561933606 expire 1561933456 last 1561933379 Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b91c484f-e487-3764-dc4b-13ef610a985a (at 10.8.26.10@o2ib6) reconnecting Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 83e53a59-cc28-333f-3bd7-6445a9dc9fd5 (at 10.8.18.28@o2ib6) Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.16@o2ib6, removing former export from same NID Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d072205a-1b1b-636c-7696-e9d92af1edee (at 10.8.20.3@o2ib6) reconnecting Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1c6cf01f-00af-d021-7941-fb8c37d4ff7c (at 10.8.20.3@o2ib6) Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.2.17@o2ib6, removing former export from same NID Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jun 30 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: 97638:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933599/real 0] req@ffff8f207d110c00 x1636719279079584/t0(0) o104->fir-MDT0002@10.8.17.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561933606 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: 97638:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.26@o2ib6, removing former export from same NID Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 08a93feb-23b2-0c44-b594-18b6878dec21 (at 10.8.30.18@o2ib6) reconnecting Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9b97a58f-573c-46e2-b9e4-530918c27ae7 (at 10.8.30.18@o2ib6) Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: Skipped 142 previous similar messages Jun 30 15:26:47 fir-md1-s1 kernel: LustreError: 44034:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1db41ba450 x1635086943010656/t0(0) o4->fe414b50-a889-d1c3-c193-5f58a4966fe7@10.8.1.8@o2ib6:29/0 lens 488/448 e 1 to 0 dl 1561933619 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:26:47 fir-md1-s1 kernel: LustreError: 44034:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 17 previous similar messages Jun 30 15:26:47 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jun 30 15:26:48 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:26:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1788eaca00 Jun 30 15:26:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with fe414b50-a889-d1c3-c193-5f58a4966fe7 (at 10.8.1.8@o2ib6), client will retry: rc = -110 Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.31.10@o2ib6, removing former export from same NID Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 19f53614-4ac3-8e1e-1ec0-b9833a2b383f (at 10.8.22.19@o2ib6) reconnecting Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: Skipped 191 previous similar messages Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1835b66b-5b94-b0fe-f70f-6ec070b9ba03 (at 10.8.22.19@o2ib6) Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: Skipped 289 previous similar messages Jun 30 15:26:49 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 018b4088-9100-7f5b-2709-38dd7f461ac7 (at 10.8.8.29@o2ib6) reconnecting Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c952479e-eacc-a158-a9f3-c256f0987c93 (at 10.8.23.32@o2ib6) Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: Skipped 430 previous similar messages Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.32@o2ib6, removing former export from same NID Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: Skipped 138 previous similar messages Jun 30 15:26:53 fir-md1-s1 kernel: Lustre: Skipped 290 previous similar messages Jun 30 15:26:54 fir-md1-s1 kernel: Lustre: 24585:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f15b05dfb00 x1631310231321104/t0(0) o101->2defae61-8bf0-dee6-7d48-53b83a69e973@10.8.17.24@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1561933619 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d3e22dd2-d25d-28e8-5f86-5d27043eaa8d (at 10.8.7.18@o2ib6) reconnecting Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: Skipped 461 previous similar messages Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.9@o2ib6, removing former export from same NID Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: Skipped 230 previous similar messages Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0bcd0825-5f35-e709-e57c-d41ae345f214 (at 10.8.23.9@o2ib6) Jun 30 15:27:01 fir-md1-s1 kernel: Lustre: Skipped 695 previous similar messages Jun 30 15:27:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5cad2422-3e98-66d4-e9e4-0ce15d870f56 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0d62f99800, cur 1561933622 expire 1561933472 last 1561933395 Jun 30 15:27:06 fir-md1-s1 kernel: Lustre: 44034:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1f5620a850 x1631558639360624/t0(0) o4->84fd8c4b-6545-cd41-282d-ef5f651cba30@10.8.17.11@o2ib6:11/0 lens 488/448 e 1 to 0 dl 1561933631 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:09 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1f0f920050 x1634321379548032/t0(0) o4->545f12c1-4799-a254-b9c4-f75f43e1bc5b@10.8.27.23@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1561933646 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:09 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jun 30 15:27:19 fir-md1-s1 kernel: Lustre: 22005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933632/real 0] req@ffff8f19b179f200 x1636719279124976/t0(0) o104->fir-MDT0002@10.8.8.37@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561933639 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:27:20 fir-md1-s1 kernel: Lustre: 97638:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:21s); client may timeout. req@ffff8f15b05dfb00 x1631310231321104/t0(0) o101->2defae61-8bf0-dee6-7d48-53b83a69e973@10.8.17.24@o2ib6:29/0 lens 480/536 e 1 to 0 dl 1561933619 ref 1 fl Complete:/0/0 rc 0/0 Jun 30 15:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7b8c2334-5441-fafb-761f-7bfdc2fe1e61 (at 10.8.18.30@o2ib6) reconnecting Jun 30 15:27:20 fir-md1-s1 kernel: Lustre: Skipped 302 previous similar messages Jun 30 15:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2ded1f4a-314b-a6a0-d3a0-8acbcea0369c (at 10.8.18.30@o2ib6) Jun 30 15:27:20 fir-md1-s1 kernel: Lustre: Skipped 459 previous similar messages Jun 30 15:27:21 fir-md1-s1 kernel: Lustre: 25635:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1f0f920050 x1634321379548032/t0(0) o4->545f12c1-4799-a254-b9c4-f75f43e1bc5b@10.8.27.23@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1561933646 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.15.3@o2ib6, removing former export from same NID Jun 30 15:27:22 fir-md1-s1 kernel: Lustre: Skipped 154 previous similar messages Jun 30 15:27:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:27:27 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f3983ec00 Jun 30 15:27:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6), client will retry: rc = -110 Jun 30 15:27:27 fir-md1-s1 kernel: Lustre: 21516:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:16s); client may timeout. req@ffff8f1f5620a850 x1631558639360624/t0(0) o4->84fd8c4b-6545-cd41-282d-ef5f651cba30@10.8.17.11@o2ib6:11/0 lens 488/448 e 1 to 0 dl 1561933631 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jun 30 15:27:28 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bf1f95400 x1634927464071184/t349380999792(0) o36->a2d1cfa6-4e2d-7226-3700-dc24c44c8e97@10.9.108.16@o2ib4:2/0 lens 488/3152 e 1 to 0 dl 1561933652 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:35 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933648/real 0] req@ffff8f0df9831e00 x1636719279145264/t0(0) o104->fir-MDT0002@10.8.8.37@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561933655 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:27:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.37@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f24971e3a80/0x5d9ee62ab146b59d lrc: 4/0,0 mode: PR/PR res: [0x2c002c268:0x189:0x0].0x0 bits 0x1b/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.8.8.37@o2ib6 remote: 0x7c2525caa0dc4085 expref: 4991 pid: 97662 timeout: 1048721 lvb_type: 0 Jun 30 15:27:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 15:27:43 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0545373f00 x1631546446279584/t349381007355(0) o36->25c05458-1ff8-5b3c-505b-360943a414ba@10.9.104.66@o2ib4:18/0 lens 488/3152 e 1 to 0 dl 1561933668 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:47 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:27:47 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e7bba7c00 Jun 30 15:27:47 fir-md1-s1 kernel: Lustre: 46560:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:21s); client may timeout. req@ffff8f1f0f920050 x1634321379548032/t0(0) o4->545f12c1-4799-a254-b9c4-f75f43e1bc5b@10.8.27.23@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1561933646 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jun 30 15:27:48 fir-md1-s1 kernel: Lustre: 20378:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933661/real 0] req@ffff8f19b179da00 x1636719279156688/t0(0) o104->fir-MDT0002@10.8.0.68@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561933668 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:27:48 fir-md1-s1 kernel: Lustre: 20378:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 30 15:27:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8ce01d8b-55b4-edf0-189b-0eb92aac6c18 (at 10.8.21.11@o2ib6) Jun 30 15:27:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 486bedd2-dd65-4f17-854a-54ed20ee472c (at 10.8.21.11@o2ib6) reconnecting Jun 30 15:27:52 fir-md1-s1 kernel: Lustre: Skipped 1103 previous similar messages Jun 30 15:27:52 fir-md1-s1 kernel: Lustre: Skipped 1660 previous similar messages Jun 30 15:27:53 fir-md1-s1 kernel: Lustre: 23638:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f396a602d00 x1635088425115888/t349381010483(0) o36->9c7adb50-64f1-6d92-d619-cdf901757223@10.9.108.11@o2ib4:28/0 lens 488/3152 e 1 to 0 dl 1561933678 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:27:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.22@o2ib6, removing former export from same NID Jun 30 15:27:54 fir-md1-s1 kernel: Lustre: Skipped 601 previous similar messages Jun 30 15:28:13 fir-md1-s1 kernel: Lustre: 23613:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f17cb07d100 x1631598529828544/t0(0) o101->7f8dc145-a081-da87-1da4-154358301486@10.9.108.1@o2ib4:18/0 lens 576/3264 e 1 to 0 dl 1561933698 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:28:13 fir-md1-s1 kernel: Lustre: 23613:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jun 30 15:28:19 fir-md1-s1 kernel: LustreError: 23682:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.0.65@o2ib6) failed to reply to blocking AST (req@ffff8f3afd38c200 x1636719279149952 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f22e3389f80/0x5d9ee62aa31edf45 lrc: 4/0,0 mode: PR/PR res: [0x200029c3c:0xc95:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.65@o2ib6 remote: 0xf6c4443c4931009d expref: 510356 pid: 21483 timeout: 1048741 lvb_type: 0 Jun 30 15:28:19 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.0.65@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 30 15:28:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.0.68@o2ib6) failed to reply to blocking AST (req@ffff8f19b179da00 x1636719279156688 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f1ee118e780/0x5d9ee62aad882d3e lrc: 4/0,0 mode: PR/PR res: [0x2c002bff7:0x491:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x60000400010020 nid: 10.8.0.68@o2ib6 remote: 0xe8ef120cd2738cc3 expref: 29209 pid: 22280 timeout: 1048751 lvb_type: 0 Jun 30 15:28:40 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.68@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 30 15:28:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 59s: evicting client at 10.8.0.68@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1ee118e780/0x5d9ee62aad882d3e lrc: 3/0,0 mode: PR/PR res: [0x2c002bff7:0x491:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x60000400010020 nid: 10.8.0.68@o2ib6 remote: 0xe8ef120cd2738cc3 expref: 29210 pid: 22280 timeout: 0 lvb_type: 0 Jun 30 15:28:46 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-26), not sending early reply req@ffff8f1ebd0add00 x1635088425129824/t0(0) o101->9c7adb50-64f1-6d92-d619-cdf901757223@10.9.108.11@o2ib4:21/0 lens 576/3264 e 0 to 0 dl 1561933731 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:28:46 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Jun 30 15:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ac9cd631-a534-1fba-753c-5069b079d1ad (at 10.8.24.16@o2ib6) reconnecting Jun 30 15:28:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5afbb881-550f-fa08-cafd-4158b37c9811 (at 10.8.24.16@o2ib6) Jun 30 15:28:57 fir-md1-s1 kernel: Lustre: Skipped 2584 previous similar messages Jun 30 15:28:57 fir-md1-s1 kernel: Lustre: Skipped 1734 previous similar messages Jun 30 15:28:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.35@o2ib6, removing former export from same NID Jun 30 15:28:59 fir-md1-s1 kernel: Lustre: Skipped 807 previous similar messages Jun 30 15:29:08 fir-md1-s1 kernel: LustreError: 25677:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f42b3bc9b00 x1636719279237744/t0(0) o104->fir-MDT0000@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:29:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.65@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1eb9c7af40/0x5d9ee62aabaa4472 lrc: 3/0,0 mode: PR/PR res: [0x200029c3c:0xde5:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.0.65@o2ib6 remote: 0xf6c4443c496a0e5c expref: 269628 pid: 97672 timeout: 1048837 lvb_type: 0 Jun 30 15:29:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 15:29:38 fir-md1-s1 kernel: Lustre: 23683:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933771/real 0] req@ffff8f1329df7800 x1636719279282784/t0(0) o104->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561933778 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:29:56 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f14a4515100 x1634478624590720/t0(0) o101->e15f364b-b556-833b-9c7c-0e0e1407bf82@10.9.0.62@o2ib4:1/0 lens 1776/3288 e 0 to 0 dl 1561933801 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:29:56 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 28 previous similar messages Jun 30 15:30:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.3@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f18c074e0c0/0x5d9ee62ab48a8a40 lrc: 4/0,0 mode: PR/PR res: [0x200029c4a:0xb328:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.15.3@o2ib6 remote: 0x6292ab32aa6537dd expref: 11526 pid: 97663 timeout: 1048860 lvb_type: 0 Jun 30 15:30:04 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:30:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.17.12@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1be49a4800/0x5d9ee62ab0affc4d lrc: 4/0,0 mode: PR/PR res: [0x2c002c309:0xc4a1:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x60000400010020 nid: 10.8.17.12@o2ib6 remote: 0xb9a0d20b4227c5b2 expref: 5562 pid: 97664 timeout: 1048879 lvb_type: 0 Jun 30 15:30:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 15:30:38 fir-md1-s1 kernel: LustreError: 25677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561933748, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f450566d580/0x5d9ee62ab5001b5a lrc: 3/0,1 mode: --/CW res: [0x200029c3c:0xde5:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 25677 timeout: 0 lvb_type: 0 Jun 30 15:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d22c54d9-f3ee-e6f8-f34c-cd9ceccbd787 (at 10.8.2.24@o2ib6) reconnecting Jun 30 15:31:05 fir-md1-s1 kernel: Lustre: Skipped 2866 previous similar messages Jun 30 15:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 721a064b-1659-db7a-cc36-f67cf8b564bb (at 10.8.2.24@o2ib6) Jun 30 15:31:05 fir-md1-s1 kernel: Lustre: Skipped 4302 previous similar messages Jun 30 15:31:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.15.9@o2ib6, removing former export from same NID Jun 30 15:31:12 fir-md1-s1 kernel: Lustre: Skipped 1419 previous similar messages Jun 30 15:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4343a906-23d9-f729-b768-bcd0549ada0d (at 10.8.8.37@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e67882400, cur 1561933891 expire 1561933741 last 1561933664 Jun 30 15:31:38 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:32:13 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:32:14 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933927/real 0] req@ffff8f161d74bc00 x1636719279463696/t0(0) o106->fir-MDT0000@10.8.18.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1561933934 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:32:14 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jun 30 15:32:38 fir-md1-s1 kernel: Lustre: 22289:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2496f3b600 x1631538936016256/t0(0) o101->769d013d-f990-3399-dde8-f67f737a957d@10.8.7.25@o2ib6:13/0 lens 576/3264 e 1 to 0 dl 1561933963 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:32:38 fir-md1-s1 kernel: Lustre: 22289:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jun 30 15:32:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:32:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 30 15:32:46 fir-md1-s1 kernel: LustreError: 20730:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f251a7be000 x1636719279554688/t0(0) o104->fir-MDT0000@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:32:46 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f2496f3b600 x1631538936016256/t0(0) o101->769d013d-f990-3399-dde8-f67f737a957d@10.8.7.25@o2ib6:13/0 lens 576/536 e 1 to 0 dl 1561933963 ref 1 fl Complete:/0/0 rc 0/0 Jun 30 15:33:16 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:33:16 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jun 30 15:33:23 fir-md1-s1 kernel: Lustre: 21483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561933996/real 0] req@ffff8f161aaf9e00 x1636719279725872/t0(0) o104->fir-MDT0002@10.8.28.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561934003 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:33:23 fir-md1-s1 kernel: Lustre: 21483:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 30 15:33:46 fir-md1-s1 kernel: LustreError: 46576:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f204e271450 x1636413998790944/t0(0) o4->f5114f0b-b017-9912-d44d-f24fe0d2ebc9@10.8.26.33@o2ib6:22/0 lens 488/448 e 1 to 0 dl 1561934032 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:33:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:33:59 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2055541a00 Jun 30 15:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f5114f0b-b017-9912-d44d-f24fe0d2ebc9 (at 10.8.26.33@o2ib6), client will retry: rc = -110 Jun 30 15:33:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 15:33:59 fir-md1-s1 kernel: Lustre: 46576:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:7s); client may timeout. req@ffff8f204e271450 x1636413998790944/t0(0) o4->f5114f0b-b017-9912-d44d-f24fe0d2ebc9@10.8.26.33@o2ib6:22/0 lens 488/448 e 1 to 0 dl 1561934032 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jun 30 15:34:03 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20fb121200 Jun 30 15:34:03 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ce3c76400 Jun 30 15:34:07 fir-md1-s1 kernel: LustreError: 55488:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f4503e59c50 x1636439708308352/t0(0) o256->420c129b-df9e-b1c5-eae5-667fed64bb9d@10.8.15.3@o2ib6:7/0 lens 304/240 e 0 to 0 dl 1561934047 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:34:07 fir-md1-s1 kernel: LustreError: 55488:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jun 30 15:34:10 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a50f0e000 Jun 30 15:34:11 fir-md1-s1 kernel: LustreError: 55538:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1f5dab7050 x1637349789347104/t0(0) o256->e02527c5-320a-cc02-89d1-b5d3560ed7b2@10.8.0.67@o2ib6:11/0 lens 304/240 e 0 to 0 dl 1561934051 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:34:11 fir-md1-s1 kernel: LustreError: 55538:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 11 previous similar messages Jun 30 15:34:11 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16bccd1e00 Jun 30 15:34:12 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e022fc400 Jun 30 15:34:12 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18b7296e00 Jun 30 15:34:12 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d6ff66000 Jun 30 15:34:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bf0b1a200 Jun 30 15:34:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a146f6400 Jun 30 15:34:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ff601a00 Jun 30 15:34:15 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f164c3c4c00 Jun 30 15:34:17 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3492ad9200 Jun 30 15:34:17 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a146f7400 Jun 30 15:34:18 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3cf9aa4200 Jun 30 15:34:19 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f26a5e9b400 Jun 30 15:34:19 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c68f0fe00 Jun 30 15:34:20 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33e6b8ce00 Jun 30 15:34:20 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f326fa6c600 Jun 30 15:34:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bee721000 Jun 30 15:34:21 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e022ffc00 Jun 30 15:34:21 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f37cd7fb600 Jun 30 15:34:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.7.28@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f16b9204140/0x5d9ee62ab52808e9 lrc: 4/0,0 mode: PW/PW res: [0x2c002bf5b:0x10a2:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.7.28@o2ib6 remote: 0x4d12180afde56068 expref: 912 pid: 22281 timeout: 1049122 lvb_type: 0 Jun 30 15:34:26 fir-md1-s1 kernel: LustreError: 50447:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f24f48d3400 ns: mdt-fir-MDT0002_UUID lock: ffff8f1888aaf980/0x5d9ee62ab535031f lrc: 3/0,0 mode: PW/PW res: [0x2c002bf5b:0x10a2:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.7.28@o2ib6 remote: 0x4d12180afde56a94 expref: 3 pid: 50447 timeout: 0 lvb_type: 0 Jun 30 15:35:04 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:35:04 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 24 previous similar messages Jun 30 15:35:05 fir-md1-s1 kernel: LustreError: 22730:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1f0f920050 x1637547176047040/t0(0) o4->4c6d21f6-3e09-6b98-bf50-a29faf23fa85@10.8.9.9@o2ib6:5/0 lens 488/448 e 1 to 0 dl 1561934105 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:35:05 fir-md1-s1 kernel: LustreError: 22730:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jun 30 15:35:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b16e4006-ad8f-de37-ede7-21e0aff43fcc (at 10.8.1.3@o2ib6) reconnecting Jun 30 15:35:22 fir-md1-s1 kernel: Lustre: Skipped 2840 previous similar messages Jun 30 15:35:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 58f0ff23-7960-f6fe-9a2f-be2834e287bf (at 10.8.1.3@o2ib6) Jun 30 15:35:22 fir-md1-s1 kernel: Lustre: Skipped 3631 previous similar messages Jun 30 15:35:28 fir-md1-s1 kernel: LustreError: 46577:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1f0f927850 x1631584018818144/t0(0) o4->cc57ad24-07f9-6270-9e45-e86bdff220e7@10.8.2.27@o2ib6:28/0 lens 488/448 e 1 to 0 dl 1561934128 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:35:28 fir-md1-s1 kernel: LustreError: 46577:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jun 30 15:35:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d78923600 Jun 30 15:35:29 fir-md1-s1 kernel: Lustre: 46577:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f1f0f927850 x1631584018818144/t0(0) o4->cc57ad24-07f9-6270-9e45-e86bdff220e7@10.8.2.27@o2ib6:28/0 lens 488/448 e 1 to 0 dl 1561934128 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jun 30 15:35:29 fir-md1-s1 kernel: Lustre: 46577:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jun 30 15:35:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.35@o2ib6, removing former export from same NID Jun 30 15:35:46 fir-md1-s1 kernel: Lustre: Skipped 774 previous similar messages Jun 30 15:35:52 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561934145/real 0] req@ffff8f18afef3300 x1636719279803936/t0(0) o104->fir-MDT0000@10.8.8.18@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561934152 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:35:52 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jun 30 15:36:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 90d881d2-bbfa-565d-91e5-ddef873ff667 (at 10.9.105.48@o2ib4) in 214 seconds. I think it's dead, and I am evicting it. exp ffff8f2523409400, cur 1561934170 expire 1561934020 last 1561933956 Jun 30 15:36:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 15:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:36:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 30 15:36:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f05bc850-7a22-d5dd-120f-662214ba49f9 (at 10.9.105.48@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe7cd000, cur 1561934183 expire 1561934033 last 1561933956 Jun 30 15:37:26 fir-md1-s1 kernel: Lustre: 27481:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1db41ba850 x1636416686482832/t0(0) o4->b5d37fef-ba24-e714-aa45-15692218e88e@10.8.1.20@o2ib6:0/0 lens 488/448 e 1 to 0 dl 1561934250 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:37:26 fir-md1-s1 kernel: Lustre: 27481:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 54 previous similar messages Jun 30 15:37:30 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1db41ba850 x1636416686482832/t0(0) o4->b5d37fef-ba24-e714-aa45-15692218e88e@10.8.1.20@o2ib6:0/0 lens 488/448 e 1 to 0 dl 1561934250 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:37:43 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:37:43 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 23 previous similar messages Jun 30 15:37:43 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bba41dc00 Jun 30 15:37:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b5d37fef-ba24-e714-aa45-15692218e88e (at 10.8.1.20@o2ib6), client will retry: rc = -110 Jun 30 15:37:43 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jun 30 15:37:43 fir-md1-s1 kernel: Lustre: 46581:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:13s); client may timeout. req@ffff8f1db41ba850 x1636416686482832/t0(0) o4->b5d37fef-ba24-e714-aa45-15692218e88e@10.8.1.20@o2ib6:0/0 lens 488/448 e 1 to 0 dl 1561934250 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jun 30 15:37:43 fir-md1-s1 kernel: Lustre: 46581:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jun 30 15:37:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.28.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f177ff38fc0/0x5d9ee62ab52f055b lrc: 4/0,0 mode: PR/PR res: [0x2c002c279:0x18eb0:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.28.11@o2ib6 remote: 0xfc47ba20c0093b9a expref: 1111 pid: 97667 timeout: 1049325 lvb_type: 0 Jun 30 15:37:47 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b5d8d4a00 Jun 30 15:37:56 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cd30c0a00 Jun 30 15:37:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e010d600 Jun 30 15:37:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2085063000 Jun 30 15:37:56 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cd30c2800 Jun 30 15:37:56 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e010f200 Jun 30 15:38:01 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ccbbe0e00 Jun 30 15:38:02 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e442abc00 Jun 30 15:38:08 fir-md1-s1 kernel: LustreError: 25085:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.65@o2ib6 arrived at 1561934288 with bad export cookie 6746082362947164325 Jun 30 15:38:08 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15dc3e8600 Jun 30 15:38:09 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24cf716e00 Jun 30 15:38:12 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d447fe200 Jun 30 15:38:14 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2008224600 Jun 30 15:38:16 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b052afc00 Jun 30 15:38:23 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16a7ab3e00 Jun 30 15:38:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.68@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f15804d3a80/0x5d9ee62ab55464f6 lrc: 4/0,0 mode: PR/PR res: [0x2c00271dd:0x2c18:0x0].0x0 bits 0x13/0x0 rrc: 161 type: IBT flags: 0x60200400000020 nid: 10.8.0.68@o2ib6 remote: 0xe8ef120cd27956b6 expref: 83 pid: 20462 timeout: 1049364 lvb_type: 0 Jun 30 15:38:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 15:38:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f222cf14600 Jun 30 15:38:26 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dc52a3a00 Jun 30 15:38:26 fir-md1-s1 kernel: LustreError: 23104:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.68@o2ib6 arrived at 1561934306 with bad export cookie 6746082362948369704 Jun 30 15:38:27 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f247faa7000 Jun 30 15:38:56 fir-md1-s1 kernel: LustreError: 21516:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1f56209850 x1637547176047040/t0(0) o4->4c6d21f6-3e09-6b98-bf50-a29faf23fa85@10.8.9.9@o2ib6:0/0 lens 488/448 e 1 to 0 dl 1561934340 ref 1 fl Interpret:/2/0 rc 0/0 Jun 30 15:38:56 fir-md1-s1 kernel: LustreError: 21516:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jun 30 15:39:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:39:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 30 15:40:13 fir-md1-s1 kernel: LustreError: 46579:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1f5620ac50 x1631566036542976/t0(0) o4->0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a@10.8.8.11@o2ib6:13/0 lens 488/448 e 0 to 0 dl 1561934413 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:40:13 fir-md1-s1 kernel: LustreError: 46579:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Jun 30 15:40:17 fir-md1-s1 kernel: Lustre: 20729:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1561934405/real 0] req@ffff8f1d73b8ec00 x1636719280117968/t0(0) o104->fir-MDT0000@10.8.18.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561934417 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 15:40:17 fir-md1-s1 kernel: Lustre: 20729:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Jun 30 15:40:18 fir-md1-s1 kernel: LustreError: 22289:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.29.3@o2ib6) failed to reply to blocking AST (req@ffff8f1e29c0c800 x1636719280087376 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f18a8227740/0x5d9ee62ab5670d8f lrc: 4/0,0 mode: EX/EX res: [0x2c002be65:0x19553:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.8.29.3@o2ib6 remote: 0x96ea67d338c51e0e expref: 881 pid: 21457 timeout: 1049472 lvb_type: 3 Jun 30 15:40:18 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.29.3@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jun 30 15:40:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.8.29.3@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f18a8227740/0x5d9ee62ab5670d8f lrc: 3/0,0 mode: EX/EX res: [0x2c002be65:0x19553:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.8.29.3@o2ib6 remote: 0x96ea67d338c51e0e expref: 882 pid: 21457 timeout: 0 lvb_type: 3 Jun 30 15:40:18 fir-md1-s1 kernel: Lustre: 22289:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:6s); client may timeout. req@ffff8f22eea4b300 x1636442429901408/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:12/0 lens 376/312 e 0 to 0 dl 1561934412 ref 1 fl Complete:/0/0 rc 0/0 Jun 30 15:40:18 fir-md1-s1 kernel: Lustre: 22289:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jun 30 15:40:18 fir-md1-s1 kernel: LustreError: 55142:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f098026e300 x1636719280127728/t0(0) o105->fir-MDT0002@10.8.29.3@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:40:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20b5394000 Jun 30 15:40:20 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34c4ee7e00 Jun 30 15:40:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 4343a906-23d9-f729-b768-bcd0549ada0d (at 10.8.8.37@o2ib6), client will retry: rc -110 Jun 30 15:40:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 15:40:22 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f26a5e9c800 Jun 30 15:40:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bdbe50a00 Jun 30 15:40:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2029a79200 Jun 30 15:40:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bdbe51200 Jun 30 15:40:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17e18f6e00 Jun 30 15:40:36 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20c4bd2200 Jun 30 15:40:39 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19ba722a00 Jun 30 15:40:45 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f45194a00 Jun 30 15:40:45 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e66662600 Jun 30 15:40:45 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e66663200 Jun 30 15:40:45 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b42657600 Jun 30 15:40:45 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d2f29bc00 Jun 30 15:40:46 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ee8496e00 Jun 30 15:40:46 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f104436f600 Jun 30 15:40:47 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dcbfff000 Jun 30 15:40:47 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2509db6e00 Jun 30 15:40:47 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dcbfff800 Jun 30 15:41:01 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934371, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1675ae8fc0/0x5d9ee62ab566aaa1 lrc: 3/0,1 mode: --/PW res: [0x2c002c30a:0xda:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50444 timeout: 0 lvb_type: 0 Jun 30 15:41:04 fir-md1-s1 kernel: LustreError: 20462:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934374, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e433c4a40/0x5d9ee62ab566dc0a lrc: 3/0,1 mode: --/PW res: [0x2c002c2eb:0x31d:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20462 timeout: 0 lvb_type: 0 Jun 30 15:42:00 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f196e53d800 ns: mdt-fir-MDT0002_UUID lock: ffff8f1675ae8fc0/0x5d9ee62ab566aaa1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c30a:0xda:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dc9d9bcc5a2 expref: 3 pid: 50444 timeout: 0 lvb_type: 0 Jun 30 15:42:06 fir-md1-s1 kernel: LustreError: 20368:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_convert from 10.8.9.9@o2ib6 arrived at 1561934526 with bad export cookie 6746082362948642522 Jun 30 15:42:07 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jun 30 15:42:07 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 55 previous similar messages Jun 30 15:42:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:42:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jun 30 15:43:22 fir-md1-s1 kernel: LustreError: 46579:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1f5620c450 x1637547176047040/t0(0) o4->4c6d21f6-3e09-6b98-bf50-a29faf23fa85@10.8.9.9@o2ib6:22/0 lens 488/448 e 1 to 0 dl 1561934602 ref 1 fl Interpret:/2/0 rc 0/0 Jun 30 15:43:22 fir-md1-s1 kernel: LustreError: 46579:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Jun 30 15:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6), client will retry: rc = -110 Jun 30 15:43:22 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jun 30 15:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) reconnecting Jun 30 15:44:10 fir-md1-s1 kernel: Lustre: Skipped 176 previous similar messages Jun 30 15:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Jun 30 15:44:10 fir-md1-s1 kernel: Lustre: Skipped 211 previous similar messages Jun 30 15:44:17 fir-md1-s1 kernel: LustreError: 27580:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1f5620ac50 x1634457905713520/t0(0) o3->1e4e71a5-88c6-3b3c-8591-f6af96f4c86f@10.8.1.28@o2ib6:16/0 lens 488/440 e 0 to 0 dl 1561934686 ref 1 fl Interpret:/0/0 rc 0/0 Jun 30 15:44:17 fir-md1-s1 kernel: LustreError: 27580:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jun 30 15:44:23 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c0f854e00 Jun 30 15:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 1e4e71a5-88c6-3b3c-8591-f6af96f4c86f (at 10.8.1.28@o2ib6), client will retry: rc -110 Jun 30 15:44:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jun 30 15:44:31 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jun 30 15:44:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:44:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 30 15:45:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1dcd61d580/0x5d9ee62ab47759e4 lrc: 3/0,0 mode: PR/PR res: [0x200025db9:0x1e79:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dc9d9463516 expref: 511061 pid: 24585 timeout: 1049777 lvb_type: 0 Jun 30 15:45:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 15:45:34 fir-md1-s1 kernel: LustreError: 50444:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1f49ea8c00 x1636719280618928/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:45:59 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1f30fad100 x1637547237343424/t0(0) o101->4c6d21f6-3e09-6b98-bf50-a29faf23fa85@10.8.9.9@o2ib6:4/0 lens 480/568 e 0 to 0 dl 1561934764 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 15:45:59 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 100 previous similar messages Jun 30 15:46:18 fir-md1-s1 kernel: LustreError: 50447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934688, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16f47f9f80/0x5d9ee62ab5a2e4bb lrc: 3/0,1 mode: --/CW res: [0x200025db9:0x1e79:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50447 timeout: 0 lvb_type: 0 Jun 30 15:46:29 fir-md1-s1 kernel: LustreError: 97672:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f20e276a700 x1636719280693616/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:46:29 fir-md1-s1 kernel: LustreError: 97672:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Jun 30 15:47:04 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934734, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f24067ee780/0x5d9ee62ab5b5a5bf lrc: 3/0,1 mode: --/PW res: [0x200029c58:0x8a1:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50444 timeout: 0 lvb_type: 0 Jun 30 15:47:21 fir-md1-s1 kernel: LustreError: 50444:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1f49eab000 x1636719280753872/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 15:47:59 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934789, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1605a0c140/0x5d9ee62ab5f15d64 lrc: 3/0,1 mode: --/PW res: [0x200029c58:0x894:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97672 timeout: 0 lvb_type: 0 Jun 30 15:48:51 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561934841, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f163ae4f740/0x5d9ee62ab60ba639 lrc: 3/0,1 mode: --/PW res: [0x200029c58:0x8a0:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50444 timeout: 0 lvb_type: 0 Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6) reconnecting Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jun 30 16:15:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jun 30 16:18:42 fir-md1-s1 kernel: Lustre: 21481:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18456b5d00 x1636442616557568/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:17/0 lens 480/568 e 1 to 0 dl 1561936727 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 16:18:42 fir-md1-s1 kernel: Lustre: 21481:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jun 30 16:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9eed212b-34d9-6e26-f1ac-cdc452decf97 (at 10.8.29.3@o2ib6) reconnecting Jun 30 16:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 86af1d07-9f84-ff94-71a6-68fd12f8c1ac (at 10.8.29.3@o2ib6) Jun 30 16:18:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jun 30 16:18:51 fir-md1-s1 kernel: Lustre: 22289:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561936722/real 1561936722] req@ffff8f1a1da4dd00 x1636719283776080/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561936731 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 16:18:51 fir-md1-s1 kernel: Lustre: 22289:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Jun 30 18:19:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 584e20c4-52de-2973-da1d-0e2ebca7e50e (at 10.9.104.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1465b3f800, cur 1561943942 expire 1561943792 last 1561943715 Jun 30 18:19:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 18:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4120f4aa-15d3-15d6-3436-73087cc4dacd (at 10.9.104.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0a8b296c00, cur 1561943947 expire 1561943797 last 1561943720 Jun 30 18:19:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 18:35:02 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jun 30 18:35:02 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jun 30 18:36:27 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jun 30 18:36:27 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 100 previous similar messages Jun 30 18:39:28 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jun 30 18:39:28 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jun 30 18:48:23 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jun 30 18:48:23 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jun 30 19:20:53 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jun 30 19:20:53 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 19:25:34 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jun 30 19:38:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jun 30 19:38:55 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 19:38:55 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jun 30 19:39:36 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 19:39:36 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 74 previous similar messages Jun 30 19:44:00 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 19:44:00 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jun 30 19:54:58 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jun 30 19:54:58 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 21:10:24 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561954217/real 1561954217] req@ffff8f1621e07b00 x1636719604468016/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561954224 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 21:10:38 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1561954231/real 1561954231] req@ffff8f1c837be900 x1636719604789072/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1561954238 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 30 21:10:38 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 30 21:10:38 fir-md1-s1 kernel: LustreError: 24577:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.9@o2ib6) returned error from blocking AST (req@ffff8f1c837be900 x1636719604789072 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f202add4a40/0x5d9ee62b06538b43 lrc: 4/0,0 mode: PR/PR res: [0x200029c10:0x348:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dca18a94de1 expref: 1589089 pid: 97642 timeout: 1069327 lvb_type: 0 Jun 30 21:10:38 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.9@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Jun 30 21:10:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 7s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f202add4a40/0x5d9ee62b06538b43 lrc: 3/0,0 mode: PR/PR res: [0x200029c10:0x348:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dca18a94de1 expref: 1589090 pid: 97642 timeout: 0 lvb_type: 0 Jun 30 21:10:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Jun 30 21:10:38 fir-md1-s1 kernel: LustreError: 20720:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f162d6d4e00 x1636719604960880/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:10:56 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f201812b000 x1634311306061824/t0(0) o36->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:1/0 lens 488/3152 e 0 to 0 dl 1561954261 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:11:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 30 21:11:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 30 21:11:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jun 30 21:11:27 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-29), not sending early reply req@ffff8f1b0cf08000 x1634311306064976/t0(0) o101->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:2/0 lens 480/568 e 0 to 0 dl 1561954292 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:11:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5cad2422-3e98-66d4-e9e4-0ce15d870f56 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3bdae53c00, cur 1561954291 expire 1561954141 last 1561954064 Jun 30 21:11:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 30 21:11:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c6d21f6-3e09-6b98-bf50-a29faf23fa85 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f204a384000, cur 1561954299 expire 1561954149 last 1561954072 Jun 30 21:12:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 30 21:12:04 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 30 21:12:07 fir-md1-s1 kernel: LustreError: 21447:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1cbb0a8f00 x1636719607006880/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:12:08 fir-md1-s1 kernel: LustreError: 24577:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954238, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1611a9e540/0x5d9ee62b0b964066 lrc: 3/0,1 mode: --/PW res: [0x200029c10:0x348:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24577 timeout: 0 lvb_type: 0 Jun 30 21:12:32 fir-md1-s1 kernel: Lustre: 24578:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1efa717500 x1633733050680096/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:7/0 lens 376/1600 e 0 to 0 dl 1561954357 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:12:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 30 21:12:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jun 30 21:12:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f249f61e540/0x5d9ee62b05d6e8ef lrc: 3/0,0 mode: PR/PR res: [0x200029c2b:0x238:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dca17d13b39 expref: 979562 pid: 97642 timeout: 1069416 lvb_type: 0 Jun 30 21:12:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 30 21:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 30 21:13:37 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 30 21:13:38 fir-md1-s1 kernel: LustreError: 21447:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f240ca1bf00 x1636719608694528/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:13:47 fir-md1-s1 kernel: LustreError: 97643:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2216ee2a00 x1636719608888272/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:13:47 fir-md1-s1 kernel: LustreError: 97643:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Jun 30 21:13:52 fir-md1-s1 kernel: LNet: Service thread pid 24577 was inactive for 200.68s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:13:52 fir-md1-s1 kernel: Pid: 24577, comm: mdt01_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:13:52 fir-md1-s1 kernel: Call Trace: Jun 30 21:13:52 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:13:52 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 30 21:13:52 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:13:52 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:13:52 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:13:52 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:13:52 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:13:52 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:13:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954432.24577 Jun 30 21:14:03 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1e32657800 x1636443528380064/t0(0) o101->7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0@10.8.29.6@o2ib6:8/0 lens 480/568 e 0 to 0 dl 1561954448 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:14:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f24eca28d80/0x5d9ee62b05f3f3c8 lrc: 3/0,0 mode: PR/PR res: [0x2000297f7:0x1ae:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dca17fff5bb expref: 778082 pid: 24577 timeout: 1069507 lvb_type: 0 Jun 30 21:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 30 21:14:08 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jun 30 21:15:08 fir-md1-s1 kernel: LustreError: 21447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954418, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1dc5285e80/0x5d9ee62b0c200537 lrc: 3/0,1 mode: --/PW res: [0x2000297f7:0x1ae:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21447 timeout: 0 lvb_type: 0 Jun 30 21:15:08 fir-md1-s1 kernel: LustreError: 21447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jun 30 21:15:13 fir-md1-s1 kernel: LustreError: 97644:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954423, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1c101572c0/0x5d9ee62b0c248f29 lrc: 3/0,1 mode: --/PW res: [0x200025b09:0x2436:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97644 timeout: 0 lvb_type: 0 Jun 30 21:15:13 fir-md1-s1 kernel: LustreError: 97644:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jun 30 21:15:43 fir-md1-s1 kernel: LustreError: 26254:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f21e3cda100 x1636719610292176/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:15:43 fir-md1-s1 kernel: LustreError: 26254:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jun 30 21:16:11 fir-md1-s1 kernel: Lustre: 20461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-8), not sending early reply req@ffff8f2503ac3c00 x1634120669143008/t0(0) o101->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:16/0 lens 1776/3288 e 0 to 0 dl 1561954576 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:16:11 fir-md1-s1 kernel: Lustre: 20461:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jun 30 21:16:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jun 30 21:16:12 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jun 30 21:16:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f24e375b180/0x5d9ee62ac8ed1602 lrc: 3/0,0 mode: PR/PR res: [0x200029c6b:0x29:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x4e1a1dc9fab4b610 expref: 557097 pid: 21481 timeout: 1069637 lvb_type: 0 Jun 30 21:16:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Jun 30 21:16:43 fir-md1-s1 kernel: LustreError: 22282:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f188462d700 x1636719610629984/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:16:43 fir-md1-s1 kernel: LustreError: 22282:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jun 30 21:16:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jun 30 21:16:43 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jun 30 21:16:59 fir-md1-s1 kernel: LNet: Service thread pid 21447 was inactive for 200.69s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:16:59 fir-md1-s1 kernel: Pid: 21447, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:16:59 fir-md1-s1 kernel: Call Trace: Jun 30 21:16:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 30 21:16:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:16:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:16:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:16:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:16:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954619.21447 Jun 30 21:17:02 fir-md1-s1 kernel: LNet: Service thread pid 24581 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:17:02 fir-md1-s1 kernel: Pid: 24581, comm: mdt01_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:17:02 fir-md1-s1 kernel: Call Trace: Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:17:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:17:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:17:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954622.24581 Jun 30 21:17:02 fir-md1-s1 kernel: Pid: 21434, comm: mdt01_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:17:02 fir-md1-s1 kernel: Call Trace: Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 30 21:17:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:17:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:17:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:17:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:17:04 fir-md1-s1 kernel: LNet: Service thread pid 97644 was inactive for 200.77s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:17:04 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jun 30 21:17:04 fir-md1-s1 kernel: Pid: 97644, comm: mdt01_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:17:04 fir-md1-s1 kernel: Call Trace: Jun 30 21:17:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 30 21:17:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:17:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:17:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:17:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:17:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954624.97644 Jun 30 21:17:14 fir-md1-s1 kernel: LustreError: 26254:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954543, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16b93bde80/0x5d9ee62b0c9c9a8e lrc: 3/0,1 mode: --/CW res: [0x200011529:0x9f44:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26254 timeout: 0 lvb_type: 0 Jun 30 21:17:14 fir-md1-s1 kernel: LustreError: 26254:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Jun 30 21:17:31 fir-md1-s1 kernel: LNet: Service thread pid 21447 completed after 233.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 21:17:53 fir-md1-s1 kernel: LNet: Service thread pid 24577 completed after 441.62s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 21:17:57 fir-md1-s1 kernel: LustreError: 97671:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e4de1e000 x1636719611004416/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jun 30 21:17:57 fir-md1-s1 kernel: LustreError: 97671:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jun 30 21:18:34 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954624, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1f8fbe9680/0x5d9ee62b0cc2e4f6 lrc: 3/0,1 mode: --/CW res: [0x200025ed5:0x35cc:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97643 timeout: 0 lvb_type: 0 Jun 30 21:18:34 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jun 30 21:19:04 fir-md1-s1 kernel: LNet: Service thread pid 26254 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:19:04 fir-md1-s1 kernel: Pid: 26254, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:19:04 fir-md1-s1 kernel: Call Trace: Jun 30 21:19:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 30 21:19:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:19:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:19:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:19:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:19:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954744.26254 Jun 30 21:19:27 fir-md1-s1 kernel: LustreError: 97671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1561954677, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f25237fd100/0x5d9ee62b0ce98cfe lrc: 3/0,1 mode: --/CW res: [0x20000fd6b:0x1fd91:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97671 timeout: 0 lvb_type: 0 Jun 30 21:20:25 fir-md1-s1 kernel: LNet: Service thread pid 97643 was inactive for 200.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 30 21:20:25 fir-md1-s1 kernel: Pid: 97643, comm: mdt01_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jun 30 21:20:25 fir-md1-s1 kernel: Call Trace: Jun 30 21:20:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jun 30 21:20:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jun 30 21:20:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jun 30 21:20:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 30 21:20:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Jun 30 21:20:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1561954825.97643 Jun 30 21:21:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0af4f40a-317e-88ce-7d9c-c4839b78e5a4 (at 10.8.29.6@o2ib6) Jun 30 21:21:24 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jun 30 21:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7b7e9b9d-7d80-a5c4-07fd-dd92cbcbe2f0 (at 10.8.29.6@o2ib6) reconnecting Jun 30 21:21:55 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jun 30 21:21:55 fir-md1-s1 kernel: LNet: Service thread pid 24581 completed after 493.65s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 21:22:14 fir-md1-s1 kernel: LNet: Service thread pid 97643 completed after 309.47s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 21:22:23 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-8), not sending early reply req@ffff8f1ed6691800 x1634120669206288/t0(0) o101->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:28/0 lens 576/3264 e 0 to 0 dl 1561954948 ref 2 fl Interpret:/0/0 rc 0/0 Jun 30 21:22:23 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jun 30 21:22:33 fir-md1-s1 kernel: LNet: Service thread pid 21434 completed after 531.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 21:22:33 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jun 30 21:23:21 fir-md1-s1 kernel: LNet: Service thread pid 26254 completed after 457.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 30 22:04:17 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jun 30 22:04:17 fir-md1-s1 kernel: LustreError: 65760:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 22:04:57 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jun 30 22:04:57 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 30 previous similar messages Jun 30 22:08:29 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 22:08:29 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 56 previous similar messages Jun 30 22:36:00 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 22:38:46 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 22:38:46 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 23:06:25 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 23:09:11 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 23:09:11 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 23:36:49 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 23:39:41 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jun 30 23:39:41 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jun 30 23:55:59 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 01 00:40:58 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 01 00:41:08 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 01 00:41:08 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 00:41:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 01 00:41:33 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 00:41:48 fir-md1-s1 kernel: LustreError: 21289:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 01 00:41:48 fir-md1-s1 kernel: LustreError: 21289:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 00:42:13 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 36864 GRANT, real grant 0 Jul 01 00:42:13 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 00:42:43 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 36864 GRANT, real grant 0 Jul 01 00:42:43 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 00:43:33 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 36864 GRANT, real grant 0 Jul 01 00:43:33 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 01 00:44:08 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 40960 GRANT, real grant 0 Jul 01 00:44:08 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 01 00:45:14 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 36864 GRANT, real grant 0 Jul 01 00:45:14 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 01 00:47:42 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 01 00:47:42 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 01 00:52:00 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 36864 GRANT, real grant 0 Jul 01 00:52:00 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jul 01 02:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f37c3da1-0e56-86e1-dca2-c29b3ae80868 (at 10.9.112.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b08d800, cur 1561974224 expire 1561974074 last 1561973997 Jul 01 03:38:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5b03b3b6-9c4d-bfc6-6338-bc8f69dac2d7 (at 10.9.105.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521bca800, cur 1561977502 expire 1561977352 last 1561977275 Jul 01 03:38:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 03:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fa50e0be-bf58-6b43-a8b3-a284779ef524 (at 10.8.13.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ee74e400, cur 1561977752 expire 1561977602 last 1561977525 Jul 01 03:42:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 03:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 81206100-f2ef-5b10-2ad6-4678a9c95a5d (at 10.8.11.16@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff8f2522712000, cur 1561977828 expire 1561977678 last 1561977657 Jul 01 03:43:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 01 04:46:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453a3d2c00, cur 1561981609 expire 1561981459 last 1561981382 Jul 01 04:46:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 04:50:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Jul 01 04:50:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 01 06:47:39 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 06:47:39 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 01 08:28:09 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 08:28:09 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 01 09:08:23 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 09:08:23 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 09:38:51 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 09:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6b8f3c35-570b-9d9c-7deb-30e6f23700dc (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f190e7cf000, cur 1561999549 expire 1561999399 last 1561999322 Jul 01 09:45:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 09:45:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jul 01 09:45:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a336b8b2-1d90-8ceb-26db-5f246ea4b144 (at 10.8.28.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522d64000, cur 1562000931 expire 1562000781 last 1562000704 Jul 01 10:08:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:09:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 749699ee-a0f2-6ab2-f022-71007184e2c9 (at 10.8.8.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1489d17800, cur 1562000951 expire 1562000801 last 1562000724 Jul 01 10:09:11 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 01 10:23:12 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 10:23:12 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 10:24:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6917e80a-ef67-f4c1-8e7b-9c14a42b1479 (at 10.9.106.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251fc94000, cur 1562001882 expire 1562001732 last 1562001655 Jul 01 10:24:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 01 10:25:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 003a2101-a39b-a323-1cef-2a0a958a28de (at 10.9.106.22@o2ib4) in 226 seconds. I think it's dead, and I am evicting it. exp ffff8f4523a3f400, cur 1562001958 expire 1562001808 last 1562001732 Jul 01 10:25:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:30:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.113.7@o2ib4) Jul 01 10:30:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:31:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1874c4f4-ebcb-1671-9e3e-6934890254c1 (at 10.9.115.6@o2ib4) Jul 01 10:31:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:33:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c0904b9-a746-baa3-6518-92bf7219376b (at 10.9.108.21@o2ib4) Jul 01 10:33:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:35:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jul 01 10:35:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:36:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 01549a7c-4c64-1571-057a-1e929c6f1684 (at 10.8.27.5@o2ib6) Jul 01 10:36:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:36:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a07a243c-4ef8-8b68-a74f-ac2c8e98de57 (at 10.8.23.36@o2ib6) Jul 01 10:36:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:37:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to dee4f99a-2654-25ae-e6ec-cb4bc3f136c5 (at 10.8.28.6@o2ib6) Jul 01 10:37:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 01 10:38:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c579ffa9-959a-5f2e-006d-9d0dfdb5fa5a (at 10.8.17.26@o2ib6) Jul 01 10:38:23 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 01 10:41:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d974139a-4a0e-a5af-6c7c-02323898e17e (at 10.8.13.7@o2ib6) Jul 01 10:41:02 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 01 10:48:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 01 10:48:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 10:54:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ac3cabc8-c0a0-bc39-c3a2-f19e3898f019 (at 10.9.107.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d2e2c1400, cur 1562003641 expire 1562003491 last 1562003414 Jul 01 10:54:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 01 11:17:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 01 11:17:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 01 11:18:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.15.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 01 11:21:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 707ff730-8051-57ea-574b-4ed1b41d91e5 (at 10.9.106.22@o2ib4) Jul 01 11:21:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 11:24:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 98a67850-1b7c-ef40-1816-b3372d04b91a (at 10.9.104.26@o2ib4) Jul 01 11:24:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 11:45:57 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 12:33:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 815d7676-5c34-1cc9-c5dd-bad0fb6e70bb (at 10.8.14.8@o2ib6) Jul 01 12:33:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 01 13:59:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 01 13:59:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 14:05:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 09178838-ce52-4043-1e0e-21a0c9717f63 (at 10.9.106.52@o2ib4) Jul 01 14:05:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 14:23:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f2595515-8d55-d4e7-ea74-00e6bd9e71d3 (at 10.9.112.9@o2ib4) Jul 01 14:23:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 14:30:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c2a05b4-f659-9028-b43b-812cba74e3fc (at 10.9.106.70@o2ib4) Jul 01 14:30:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 14:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 54e0fe0b-05a8-2283-e1e6-c0953941c584 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a0cfc5000, cur 1562018398 expire 1562018248 last 1562018171 Jul 01 14:59:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 01 15:00:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 54e0fe0b-05a8-2283-e1e6-c0953941c584 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0517995000, cur 1562018401 expire 1562018251 last 1562018174 Jul 01 15:00:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 01 15:00:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Jul 01 15:00:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 15:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3dc3e1e7-01f9-9795-87bb-84df780116dc (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148a475400, cur 1562019065 expire 1562018915 last 1562018838 Jul 01 15:11:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3dc3e1e7-01f9-9795-87bb-84df780116dc (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521889800, cur 1562019067 expire 1562018917 last 1562018840 Jul 01 15:11:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 01 15:14:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1af78b07-b135-5fa3-6c26-790cdde827a0 (at 10.9.113.5@o2ib4) Jul 01 15:14:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 15:15:27 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562019320/real 1562019320] req@ffff8f087f2edd00 x1636721314172816/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562019327 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 01 15:15:34 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562019327/real 1562019327] req@ffff8f087f2edd00 x1636721314172816/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562019334 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 01 15:15:35 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1476de5100 x1636462019526192/t0(0) o101->2d9198da-101c-d19d-2b4a-c0e67a82ee58@10.9.115.13@o2ib4:10/0 lens 1784/3288 e 1 to 0 dl 1562019340 ref 2 fl Interpret:/0/0 rc 0/0 Jul 01 15:15:41 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562019334/real 1562019334] req@ffff8f087f2edd00 x1636721314172816/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562019341 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 01 15:15:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2d9198da-101c-d19d-2b4a-c0e67a82ee58 (at 10.9.115.13@o2ib4) reconnecting Jul 01 15:15:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 01 15:15:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 28a67c6c-68a9-127c-f2e6-9416760ecb77 (at 10.9.115.13@o2ib4) Jul 01 15:15:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 15:15:55 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562019348/real 1562019348] req@ffff8f087f2edd00 x1636721314172816/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562019355 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 01 15:15:55 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 01 15:15:55 fir-md1-s1 kernel: LustreError: 23582:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.4@o2ib6) failed to reply to blocking AST (req@ffff8f087f2edd00 x1636721314172816 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f22b045e0c0/0x5d9ee62c43f856dc lrc: 4/0,0 mode: PR/PR res: [0x2c002c313:0x8a9e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x987fde9b9559811d expref: 1079 pid: 97651 timeout: 1134437 lvb_type: 0 Jul 01 15:15:55 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.15.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 01 15:15:55 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f22b045e0c0/0x5d9ee62c43f856dc lrc: 3/0,0 mode: PR/PR res: [0x2c002c313:0x8a9e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x987fde9b9559811d expref: 1080 pid: 97651 timeout: 0 lvb_type: 0 Jul 01 15:15:55 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 01 15:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e67c9bb1-bbe7-aaeb-bb52-bf9dda890aef (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25216c1400, cur 1562019523 expire 1562019373 last 1562019296 Jul 01 15:36:25 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 15:36:25 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 15:48:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 15612231-22fb-a7bb-cd40-727fbf0eb380 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d50240800, cur 1562021306 expire 1562021156 last 1562021079 Jul 01 15:48:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 01 15:48:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jul 01 15:52:47 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 15:52:48 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:52:48 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 01 15:52:49 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:52:49 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 01 15:52:51 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:52:51 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 01 15:53:18 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 15:53:18 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 01 15:53:27 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:53:27 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jul 01 15:53:54 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 15:53:54 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 01 15:54:39 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:54:39 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31 previous similar messages Jul 01 15:56:26 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:56:26 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 01 15:59:42 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 15:59:42 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jul 01 16:04:30 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 16:04:30 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 01 16:14:00 fir-md1-s1 kernel: LustreError: 69438:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 16:14:00 fir-md1-s1 kernel: LustreError: 69438:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 109 previous similar messages Jul 01 16:30:07 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 16:30:07 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 38 previous similar messages Jul 01 16:41:28 fir-md1-s1 kernel: LustreError: 46579:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 16:41:28 fir-md1-s1 kernel: LustreError: 46579:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 117 previous similar messages Jul 01 16:52:33 fir-md1-s1 kernel: LustreError: 69435:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 16:52:33 fir-md1-s1 kernel: LustreError: 69435:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 11 previous similar messages Jul 01 17:02:37 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 17:02:37 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 111 previous similar messages Jul 01 17:25:39 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 17:25:39 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 01 17:27:07 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 17:27:07 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 97 previous similar messages Jul 01 17:29:48 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 17:29:48 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 01 17:39:39 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 17:39:39 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 34 previous similar messages Jul 01 17:50:03 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 17:50:03 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 105 previous similar messages Jul 01 18:43:14 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 18:43:14 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 29 previous similar messages Jul 01 18:44:45 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 18:44:45 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 70 previous similar messages Jul 01 18:48:55 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 18:48:55 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jul 01 19:21:03 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 01 19:21:03 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 01 19:32:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 815d7676-5c34-1cc9-c5dd-bad0fb6e70bb (at 10.8.14.8@o2ib6) Jul 01 19:32:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 21:50:07 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 22:48:26 fir-md1-s1 kernel: LustreError: 69435:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 01 22:50:44 fir-md1-s1 kernel: Lustre: 69435:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f217734fc50 x1637891143276192/t0(0) o4->9a853c02-c745-a56d-0dbc-5a9440eb652f@10.8.15.6@o2ib6:19/0 lens 488/448 e 1 to 0 dl 1562046649 ref 2 fl Interpret:/0/0 rc 0/0 Jul 01 22:50:49 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f217734c850 x1637891143276256/t0(0) o4->9a853c02-c745-a56d-0dbc-5a9440eb652f@10.8.15.6@o2ib6:19/0 lens 488/448 e 1 to 0 dl 1562046649 ref 1 fl Interpret:/0/0 rc 0/0 Jul 01 22:50:49 fir-md1-s1 kernel: LustreError: 46562:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f21c9b4d450 x1637891143276208/t0(0) o4->9a853c02-c745-a56d-0dbc-5a9440eb652f@10.8.15.6@o2ib6:19/0 lens 488/448 e 1 to 0 dl 1562046649 ref 1 fl Interpret:/0/0 rc 0/0 Jul 01 22:50:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 9a853c02-c745-a56d-0dbc-5a9440eb652f (at 10.8.15.6@o2ib6), client will retry: rc = -110 Jul 01 22:50:49 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 01 22:53:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9a853c02-c745-a56d-0dbc-5a9440eb652f (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2515b47000, cur 1562046834 expire 1562046684 last 1562046607 Jul 01 22:53:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 01 22:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9a853c02-c745-a56d-0dbc-5a9440eb652f (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23ae905400, cur 1562046856 expire 1562046706 last 1562046629 Jul 01 22:54:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 01 23:23:42 fir-md1-s1 kernel: LustreError: 69438:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 00:22:23 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 02 00:22:23 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 16 previous similar messages Jul 02 00:24:05 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 02 00:24:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 815d7676-5c34-1cc9-c5dd-bad0fb6e70bb (at 10.8.14.8@o2ib6) Jul 02 00:24:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 01:31:01 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 03:15:56 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 03:17:50 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:23:08 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:24:27 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:24:27 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 06:24:57 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:25:37 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:26:17 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:26:53 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:27:10 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:27:46 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:28:59 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:28:59 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 06:31:30 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:31:30 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 02 06:35:48 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:35:48 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jul 02 06:44:51 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:44:51 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 02 06:55:02 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 06:55:02 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 02 07:05:04 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 07:05:04 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 02 07:15:25 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 07:15:25 fir-md1-s1 kernel: LustreError: 46550:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jul 02 07:25:48 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 07:25:48 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 27 previous similar messages Jul 02 07:27:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client cf80c3b1-3a35-aa95-401d-bdf5eda594e5 (at 10.9.105.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14758ea800, cur 1562077625 expire 1562077475 last 1562077398 Jul 02 07:27:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 55b02c38-d9ce-c2f6-066c-e168569494ff (at 10.9.105.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4505750400, cur 1562077631 expire 1562077481 last 1562077404 Jul 02 07:27:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 02 07:35:56 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 07:35:56 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 02 07:46:33 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 07:46:33 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 11 previous similar messages Jul 02 08:04:54 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 08:04:54 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 02 08:18:43 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 08:18:43 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 09:09:34 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:09:34 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 09:10:49 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:10:49 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 09:13:42 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:13:42 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 02 09:19:15 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:19:15 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 7 previous similar messages Jul 02 09:30:15 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:30:15 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 02 09:40:23 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:40:23 fir-md1-s1 kernel: LustreError: 21736:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 02 09:51:05 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 09:51:05 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 02 10:01:05 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:01:05 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 24 previous similar messages Jul 02 10:11:08 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:11:08 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 10:21:51 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:21:51 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 10:32:21 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:32:21 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 10:42:22 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:42:22 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 02 10:52:31 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 10:52:31 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 21 previous similar messages Jul 02 11:02:34 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:02:34 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 21 previous similar messages Jul 02 11:13:35 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:13:35 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jul 02 11:23:42 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:23:42 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 11:33:48 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:33:48 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jul 02 11:43:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:43:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 34 previous similar messages Jul 02 11:53:59 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 11:53:59 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 37 previous similar messages Jul 02 12:04:24 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 12:04:24 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 02 12:14:45 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 12:14:45 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 02 12:24:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 49d65b51-f641-7136-faca-91f4ee67f9ec (at 10.9.0.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe518000, cur 1562095462 expire 1562095312 last 1562095235 Jul 02 12:25:08 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 12:25:08 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 12:57:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Jul 02 12:57:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 13:07:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Jul 02 13:07:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 13:30:12 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 13:30:12 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jul 02 13:38:41 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 14:04:51 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 14:06:45 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 16:16:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 088aec52-9508-3401-0290-3c12a91037c4 (at 10.9.106.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25227a8400, cur 1562109386 expire 1562109236 last 1562109159 Jul 02 16:16:26 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 02 16:43:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 18baa8eb-3796-4c59-4335-f1e0f1008b8c (at 10.9.112.8@o2ib4) Jul 02 16:43:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:43:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bfd7f797-6fd1-93d6-b01a-220fa07218f9 (at 10.9.112.10@o2ib4) Jul 02 16:43:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:44:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 02 16:44:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:44:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 29e229ef-0b7d-e0ce-48dd-1c614dad7928 (at 10.9.112.15@o2ib4) Jul 02 16:44:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:45:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2eee0836-3c27-6ecc-2655-fed0ce55b4ff (at 10.8.15.4@o2ib6) Jul 02 16:45:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:50:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cf80c3b1-3a35-aa95-401d-bdf5eda594e5 (at 10.9.105.33@o2ib4) Jul 02 16:50:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 16:50:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7c3f56db-f273-5d44-6d2d-7a51f76d6b18 (at 10.8.10.25@o2ib6) Jul 02 16:50:41 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 02 16:51:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5eebc5e0-8890-b45f-8a55-b17e54a4b047 (at 10.9.106.8@o2ib4) Jul 02 16:51:24 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 02 17:16:44 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:23:12 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:29:11 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:48:44 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 02 17:48:54 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:48:59 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:48:59 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 02 17:49:09 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:49:09 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 02 17:49:31 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 02 17:49:31 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 45 previous similar messages Jul 02 17:50:31 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:50:31 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 02 17:52:40 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 02 17:52:40 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 02 17:55:52 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 17:55:52 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 02 18:03:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c0904b9-a746-baa3-6518-92bf7219376b (at 10.9.108.21@o2ib4) Jul 02 18:03:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 02 18:13:56 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 02 18:14:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 09178838-ce52-4043-1e0e-21a0c9717f63 (at 10.9.106.52@o2ib4) Jul 02 18:14:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 18:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bb86db3d-e55d-5db8-5c35-0541a49637df (at 10.9.108.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44fed62400, cur 1562116538 expire 1562116388 last 1562116311 Jul 02 18:15:38 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 02 18:16:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c0904b9-a746-baa3-6518-92bf7219376b (at 10.9.108.21@o2ib4) Jul 02 18:16:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 18:17:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f49314e4-fa04-90d7-0408-2e73086197cd (at 10.9.106.6@o2ib4) Jul 02 18:17:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 18:18:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 749e811b-d2e8-801c-4ade-84f4076c00ba (at 10.9.106.59@o2ib4) Jul 02 18:18:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 18:19:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c2a05b4-f659-9028-b43b-812cba74e3fc (at 10.9.106.70@o2ib4) Jul 02 18:19:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 18:19:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 80fdd8bf-960f-e808-91e0-c54ca3723917 (at 10.9.106.10@o2ib4) Jul 02 18:19:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 02 18:35:13 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 02 19:58:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6d83953a-c249-6e7c-b76b-0ad244494f27 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3268e39400, cur 1562122734 expire 1562122584 last 1562122507 Jul 02 19:58:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 02 20:12:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 29e229ef-0b7d-e0ce-48dd-1c614dad7928 (at 10.9.112.15@o2ib4) Jul 02 20:12:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 03 00:59:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 97fc9649-24c2-fb40-27c9-532fdd9ea1ac (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d76f71000, cur 1562140744 expire 1562140594 last 1562140517 Jul 03 00:59:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 01:00:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jul 03 01:00:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 01:20:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1d5c7f14-25b8-ffc5-6646-b0756a504223 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f203b3e0c00, cur 1562142002 expire 1562141852 last 1562141775 Jul 03 01:20:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 01:21:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jul 03 01:21:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 02:18:01 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 02:40:18 fir-md1-s1 kernel: LustreError: 69438:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 02:40:18 fir-md1-s1 kernel: LustreError: 69438:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 03 08:35:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 90d881d2-bbfa-565d-91e5-ddef873ff667 (at 10.9.105.48@o2ib4) Jul 03 08:35:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 10:03:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7ae46df0-95ff-edbe-35f2-1ea841efe69a (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3505e30800, cur 1562173383 expire 1562173233 last 1562173156 Jul 03 10:03:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 10:03:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501b05000, cur 1562173389 expire 1562173239 last 1562173162 Jul 03 10:03:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 10:41:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) Jul 03 10:41:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 10:43:36 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 10:45:09 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 03 10:45:09 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 03 11:08:13 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177286/real 1562177286] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177293 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 11:08:20 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177293/real 1562177293] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177300 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 11:08:21 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3cf6e23300 x1637978060226256/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1562177306 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 11:08:21 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 03 11:08:27 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177300/real 1562177300] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177307 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 11:08:41 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177314/real 1562177314] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177321 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 11:08:41 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 03 11:09:02 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177335/real 1562177335] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177342 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 11:09:02 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 03 11:09:37 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562177370/real 1562177370] req@ffff8f40f3e09500 x1636723353746768/t0(0) o106->fir-MDT0002@10.8.1.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562177377 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 11:09:37 fir-md1-s1 kernel: Lustre: 23658:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 03 11:10:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7c6bb3e9-46cc-b495-a385-90ef422e1faf (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252340f000, cur 1562177403 expire 1562177253 last 1562177176 Jul 03 11:10:03 fir-md1-s1 kernel: Lustre: 23658:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:97s); client may timeout. req@ffff8f3cf6e23300 x1637978060226256/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:26/0 lens 480/536 e 1 to 0 dl 1562177306 ref 1 fl Complete:/0/0 rc 301/301 Jul 03 11:10:03 fir-md1-s1 kernel: Lustre: 23658:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 25 previous similar messages Jul 03 11:12:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jul 03 11:12:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 11:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 903b60d0-f5a3-a51e-70de-07052e4bb832 (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148acc9000, cur 1562178041 expire 1562177891 last 1562177814 Jul 03 11:20:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 11:51:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) Jul 03 11:51:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 12:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2306b587-2a3d-1ec9-bc70-ef2318848cbd (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4504d25c00, cur 1562180544 expire 1562180394 last 1562180317 Jul 03 12:02:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 12:33:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) Jul 03 12:33:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 12:37:35 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 176128 GRANT, real grant 0 Jul 03 12:38:09 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 147456 GRANT, real grant 0 Jul 03 12:38:37 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 147456 GRANT, real grant 0 Jul 03 12:38:44 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 151552 GRANT, real grant 0 Jul 03 12:39:14 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 147456 GRANT, real grant 0 Jul 03 13:38:55 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 98ff8e84-1e9a-d223-7706-0c3e5612efc7 (at 10.8.0.82@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f350da7f800, cur 1562186335 expire 1562186185 last 1562186108 Jul 03 13:38:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 13:39:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ebb0ff39-b00e-6e1a-c25b-64754a77a1b9 (at 10.8.0.82@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520743800, cur 1562186347 expire 1562186197 last 1562186120 Jul 03 13:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 336f7f9b-5dca-7bc0-f540-0bda4a5c5916 (at 10.9.106.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518915000, cur 1562187328 expire 1562187178 last 1562187101 Jul 03 13:55:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 14:30:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 98ff8e84-1e9a-d223-7706-0c3e5612efc7 (at 10.8.0.82@o2ib6) Jul 03 14:30:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 14:33:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3ec7adcc-54ba-9f81-9e8f-cc86aea17c81 (at 10.9.110.1@o2ib4) Jul 03 14:33:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 14:33:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c44cd238-8ca6-320e-a8ee-be68a5621e8e (at 10.9.110.2@o2ib4) Jul 03 14:33:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 03 15:26:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 58c0585b-a4ca-91b5-f3d9-6c740c9c6c69 (at 10.9.102.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2538047400, cur 1562192788 expire 1562192638 last 1562192561 Jul 03 15:26:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 03 15:33:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jul 03 15:33:48 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 03 15:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dc8c296c-90b6-4272-e4d3-a5c935663898 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34daabd400, cur 1562193240 expire 1562193090 last 1562193013 Jul 03 15:34:00 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 03 15:35:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c3b139fe-a52f-1c45-3280-dbbeca16676d (at 10.8.23.14@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff8f1ec8a1a800, cur 1562193316 expire 1562193166 last 1562193099 Jul 03 15:35:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:35:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to acd26ab4-a020-fbc0-1a40-f0e7d759131f (at 10.8.23.14@o2ib6) Jul 03 15:35:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:42:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e3fe702d-7407-7671-1296-c76bd9eb9ca1 (at 10.9.113.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14c7775c00, cur 1562193778 expire 1562193628 last 1562193551 Jul 03 15:42:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:45:11 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:45:14 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:45:34 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:45:34 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jul 03 15:45:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 761c131c-61da-c461-162b-cc2b93210f35 (at 10.9.102.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453beaa000, cur 1562193947 expire 1562193797 last 1562193720 Jul 03 15:45:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:45:47 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 03 15:45:47 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jul 03 15:45:59 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 03 15:45:59 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 03 15:46:16 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:46:16 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 69 previous similar messages Jul 03 15:46:55 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:46:55 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 03 15:48:10 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:48:10 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 95 previous similar messages Jul 03 15:49:21 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194154/real 1562194154] req@ffff8f404bf65700 x1636723390245824/t0(0) o104->fir-MDT0000@10.9.115.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562194161 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 15:49:21 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 03 15:49:29 fir-md1-s1 kernel: Lustre: 10586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f392d7d7b00 x1634999679427376/t0(0) o36->b60a3bb6-bbe2-b613-59ad-fb772c2a43bc@10.9.107.65@o2ib4:4/0 lens 512/448 e 1 to 0 dl 1562194174 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:49:30 fir-md1-s1 kernel: Lustre: 23565:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f317a320000 x1631621340917520/t0(0) o101->c4a74d2b-de98-9a37-7ebb-5f19657dadd1@10.9.108.2@o2ib4:5/0 lens 584/3264 e 1 to 0 dl 1562194175 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:49:35 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194168/real 1562194168] req@ffff8f404bf65700 x1636723390245824/t0(0) o104->fir-MDT0000@10.9.115.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562194175 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 15:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b60a3bb6-bbe2-b613-59ad-fb772c2a43bc (at 10.9.107.65@o2ib4) reconnecting Jul 03 15:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f35d1ecc-fa81-1964-68b0-0ffaf770a8d3 (at 10.9.107.65@o2ib4) Jul 03 15:49:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:49:35 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 03 15:49:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a0e5688c-3919-4790-9111-00c22859e271 (at 10.9.108.2@o2ib4) Jul 03 15:49:40 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e39bb0900 x1631625914855264/t0(0) o101->b2acd6c0-c0f5-61d3-4a68-78d78ff1740e@10.8.27.13@o2ib6:15/0 lens 584/3264 e 1 to 0 dl 1562194185 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:49:40 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 03 15:49:42 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1682f9ec00 x1634620019363568/t0(0) o101->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:17/0 lens 584/3264 e 1 to 0 dl 1562194187 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b2acd6c0-c0f5-61d3-4a68-78d78ff1740e (at 10.8.27.13@o2ib6) reconnecting Jul 03 15:49:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 57743c79-31a8-108e-7e60-aa89857aef81 (at 10.8.27.13@o2ib6) Jul 03 15:49:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 15:49:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bfd7f797-6fd1-93d6-b01a-220fa07218f9 (at 10.9.112.10@o2ib4) Jul 03 15:49:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 15:49:56 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194189/real 1562194189] req@ffff8f404bf65700 x1636723390245824/t0(0) o104->fir-MDT0000@10.9.115.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562194196 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 15:49:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f35d1ecc-fa81-1964-68b0-0ffaf770a8d3 (at 10.9.107.65@o2ib4) Jul 03 15:49:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:49:56 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 03 15:50:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b2acd6c0-c0f5-61d3-4a68-78d78ff1740e (at 10.8.27.13@o2ib6) reconnecting Jul 03 15:50:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 03 15:50:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 57743c79-31a8-108e-7e60-aa89857aef81 (at 10.8.27.13@o2ib6) Jul 03 15:50:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 15:50:15 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-6), not sending early reply req@ffff8f2530bab000 x1637395057955728/t0(0) o101->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:19/0 lens 584/3264 e 0 to 0 dl 1562194219 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:50:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 57743c79-31a8-108e-7e60-aa89857aef81 (at 10.8.27.13@o2ib6) Jul 03 15:50:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 03 15:50:29 fir-md1-s1 kernel: Lustre: 10561:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f41d502dd00 x1634125092485888/t0(0) o101->2eaf5a11-c409-36b3-5d68-7ef19d1bd3f9@10.9.107.68@o2ib4:4/0 lens 584/3264 e 1 to 0 dl 1562194234 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:50:31 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194224/real 1562194224] req@ffff8f404bf65700 x1636723390245824/t0(0) o104->fir-MDT0000@10.9.115.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562194231 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 15:50:31 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 03 15:50:32 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:50:32 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jul 03 15:50:45 fir-md1-s1 kernel: LustreError: 10196:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562194155, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0513e7d7c0/0x5d9ee62e682e9560 lrc: 3/1,0 mode: --/PR res: [0x200011cf2:0x1b49:0x0].0x0 bits 0x13/0x0 rrc: 14 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10196 timeout: 0 lvb_type: 0 Jul 03 15:50:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6e48472f-542d-f444-2879-49b8d614290d (at 10.9.108.8@o2ib4) reconnecting Jul 03 15:50:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 03 15:50:49 fir-md1-s1 kernel: Lustre: 22288:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1821f8c200 x1634917573629072/t0(0) o101->bb6c1ebe-228f-c2b0-845a-14ae6de0b327@10.8.27.21@o2ib6:24/0 lens 584/3264 e 0 to 0 dl 1562194254 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:50:49 fir-md1-s1 kernel: Lustre: 22288:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 03 15:50:55 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562194165, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2063019680/0x5d9ee62e68494616 lrc: 3/1,0 mode: --/PR res: [0x200011cf2:0x1b49:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97648 timeout: 0 lvb_type: 0 Jul 03 15:50:55 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 03 15:51:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b5a4ca60-bdc7-f60e-1d00-41d316e40dac (at 10.9.108.3@o2ib4) Jul 03 15:51:03 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 03 15:51:19 fir-md1-s1 kernel: LustreError: 20726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562194189, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1f2fcbcc80/0x5d9ee62e68616328 lrc: 3/1,0 mode: --/PR res: [0x200011cf2:0x1b49:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20726 timeout: 0 lvb_type: 0 Jul 03 15:51:19 fir-md1-s1 kernel: LustreError: 20726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 03 15:51:24 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1f39946900 x1634290379835008/t0(0) o101->9081d826-2f83-5b46-ff73-7e6473184838@10.8.17.25@o2ib6:29/0 lens 584/3264 e 0 to 0 dl 1562194289 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 15:51:24 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 03 15:51:41 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194294/real 1562194294] req@ffff8f404bf65700 x1636723390245824/t0(0) o104->fir-MDT0000@10.9.115.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562194301 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 15:51:41 fir-md1-s1 kernel: Lustre: 10363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 03 15:51:48 fir-md1-s1 kernel: LustreError: 10363:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.115.6@o2ib4) failed to reply to blocking AST (req@ffff8f404bf65700 x1636723390245824 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1bacc38d80/0x5d9ee62e59b154c1 lrc: 4/0,0 mode: PR/PR res: [0x200011cf2:0x1b49:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.9.115.6@o2ib4 remote: 0x6755c310f7297395 expref: 19 pid: 10332 timeout: 1309511 lvb_type: 0 Jul 03 15:51:48 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.115.6@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 03 15:51:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.115.6@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1bacc38d80/0x5d9ee62e59b154c1 lrc: 3/0,0 mode: PR/PR res: [0x200011cf2:0x1b49:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.9.115.6@o2ib4 remote: 0x6755c310f7297395 expref: 20 pid: 10332 timeout: 0 lvb_type: 0 Jul 03 15:51:49 fir-md1-s1 kernel: Lustre: 10333:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f3a4e70b600 x1635092894709312/t0(0) o101->40fe7f0a-1b2a-cef5-fe8d-06bb6237455c@10.9.108.5@o2ib4:18/0 lens 584/536 e 0 to 0 dl 1562194308 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 15:52:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a545c53d-fd13-75ed-6bde-35d1aaac7a2f (at 10.9.106.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250d915400, cur 1562194321 expire 1562194171 last 1562194094 Jul 03 15:52:01 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 03 15:52:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Jul 03 15:52:40 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 03 15:54:48 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 15:54:48 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 506 previous similar messages Jul 03 15:55:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0986c76b-92a3-5eb5-0ecd-38e58fcb1758 (at 10.8.26.34@o2ib6) Jul 03 15:55:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 03 15:55:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a0cea963-bad4-2a43-1e1e-2b16d5cc26b0 (at 10.9.113.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45355fa400, cur 1562194531 expire 1562194381 last 1562194304 Jul 03 15:55:31 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 03 15:58:15 fir-md1-s1 kernel: Lustre: 24576:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562194688/real 1562194688] req@ffff8f1612a6f500 x1636723394558688/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562194695 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 15:58:15 fir-md1-s1 kernel: Lustre: 24576:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 03 15:59:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 221c24d0-0082-781d-4acc-41656456a74c (at 10.9.106.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f451de73400, cur 1562194765 expire 1562194615 last 1562194538 Jul 03 15:59:25 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 03 15:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 556216e1-e907-ce15-d71c-dcbb67e6c0d6 (at 10.8.1.1@o2ib6) Jul 03 15:59:28 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 03 16:00:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.7@o2ib6, removing former export from same NID Jul 03 16:00:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 16:00:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4343a906-23d9-f729-b768-bcd0549ada0d (at 10.8.8.37@o2ib6) reconnecting Jul 03 16:00:31 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 03 16:03:24 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 16:03:24 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 634 previous similar messages Jul 03 16:10:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ad179e1c-410d-6164-932b-33dee7383182 (at 10.9.114.8@o2ib4) Jul 03 16:10:19 fir-md1-s1 kernel: Lustre: Skipped 158 previous similar messages Jul 03 16:13:32 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 16:13:32 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 736 previous similar messages Jul 03 16:19:53 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562195986/real 1562195986] req@ffff8f1c3bbc6900 x1636723398952656/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562195993 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 16:19:53 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 03 16:20:01 fir-md1-s1 kernel: Lustre: 10559:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f43a8a71e00 x1638080845415248/t0(0) o101->cac1eba7-cdaa-957f-8735-d5169807717b@10.9.112.9@o2ib4:6/0 lens 1784/3288 e 1 to 0 dl 1562196006 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:20:01 fir-md1-s1 kernel: Lustre: 10559:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 03 16:20:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client cac1eba7-cdaa-957f-8735-d5169807717b (at 10.9.112.9@o2ib4) reconnecting Jul 03 16:20:07 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 03 16:20:14 fir-md1-s1 kernel: Lustre: 23682:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3f8baef500 x1636461767715776/t0(0) o101->cd3d0230-3738-e2d9-7e9f-2fd94c27579a@10.9.115.5@o2ib4:18/0 lens 1784/3288 e 1 to 0 dl 1562196018 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:20:14 fir-md1-s1 kernel: Lustre: 23682:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 03 16:20:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9f9c20f1-d776-b800-1cdc-f625bb18ebc2 (at 10.9.115.5@o2ib4) Jul 03 16:20:19 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 03 16:20:26 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562196019/real 1562196019] req@ffff8f377bbb0c00 x1636723398983888/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562196026 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 16:20:26 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages Jul 03 16:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client cac1eba7-cdaa-957f-8735-d5169807717b (at 10.9.112.9@o2ib4) reconnecting Jul 03 16:20:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 16:21:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client cac1eba7-cdaa-957f-8735-d5169807717b (at 10.9.112.9@o2ib4) reconnecting Jul 03 16:21:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 03 16:21:20 fir-md1-s1 kernel: Lustre: 50442:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2af4f2a400 x1638080879728048/t0(0) o101->b3e5d320-3d62-bccd-461a-ac941a8ebc1b@10.9.112.8@o2ib4:25/0 lens 1784/3288 e 1 to 0 dl 1562196085 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:21:31 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562196084/real 1562196084] req@ffff8f1592759500 x1636723399224848/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562196091 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 16:21:31 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 59 previous similar messages Jul 03 16:22:06 fir-md1-s1 kernel: Lustre: 10143:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-8), not sending early reply req@ffff8f3263b90c00 x1638081177340544/t0(0) o101->8a102ad1-e9a6-7534-f996-e08c017dc5d4@10.9.113.6@o2ib4:11/0 lens 576/3264 e 0 to 0 dl 1562196131 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:22:06 fir-md1-s1 kernel: Lustre: 10143:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 03 16:22:20 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.4@o2ib6) failed to reply to blocking AST (req@ffff8f19ddfca700 x1636723398953968 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f18f9b4cc80/0x5d9ee62e657bbf94 lrc: 4/0,0 mode: PR/PR res: [0x2c002c33b:0x36cc:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x181a13f4089308aa expref: 72634 pid: 97639 timeout: 1311342 lvb_type: 0 Jul 03 16:22:20 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.15.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 03 16:22:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f161e62b180/0x5d9ee62e657bb3e0 lrc: 3/0,0 mode: PR/PR res: [0x2c002c33b:0x36cb:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x181a13f40893062d expref: 72635 pid: 21429 timeout: 0 lvb_type: 0 Jul 03 16:22:20 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Jul 03 16:22:33 fir-md1-s1 kernel: LustreError: 23632:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0513e58300 x1636723399418960/t0(0) o104->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client cac1eba7-cdaa-957f-8735-d5169807717b (at 10.9.112.9@o2ib4) reconnecting Jul 03 16:22:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 03 16:23:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0da3bac3-60e3-7f8e-ab8a-bd9e331cf431 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f450119dc00, cur 1562196190 expire 1562196040 last 1562195963 Jul 03 16:23:10 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 03 16:23:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 92fd2bcd-71b5-44d8-7ea5-53f463aabbb9 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e9db27c00, cur 1562196202 expire 1562196052 last 1562195975 Jul 03 16:23:35 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 16:23:35 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 688 previous similar messages Jul 03 16:32:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 26ab021a-adb7-b814-3d61-a4e6dec4651f (at 10.8.9.9@o2ib6) reconnecting Jul 03 16:32:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jul 03 16:32:31 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 03 16:32:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9a5dee3-366f-e94e-5233-92c151efbd27 (at 10.8.9.9@o2ib6) Jul 03 16:32:31 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 03 16:33:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jul 03 16:33:51 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 16:33:51 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 807 previous similar messages Jul 03 16:34:33 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562196866/real 1562196866] req@ffff8f14ce1e4200 x1636723401226512/t0(0) o104->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562196873 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 16:34:33 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 86 previous similar messages Jul 03 16:34:41 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3689a47800 x1638005242811616/t0(0) o101->8ef25a02-5cd5-8500-774d-d75ea76eaffd@10.9.112.15@o2ib4:16/0 lens 1784/3288 e 1 to 0 dl 1562196886 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:34:55 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562196887/real 1562196887] req@ffff8f14ce1e4200 x1636723401226512/t0(0) o104->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562196894 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 16:34:55 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 03 16:35:02 fir-md1-s1 kernel: LustreError: 23634:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.112.14@o2ib4) failed to reply to blocking AST (req@ffff8f14ce1e4200 x1636723401226512 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f0cfcff8000/0x5d9ee62e67f9e63c lrc: 4/0,0 mode: PR/PR res: [0x2c002c33b:0x370a:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.112.14@o2ib4 remote: 0x4b3e857652935417 expref: 39771 pid: 23589 timeout: 1311984 lvb_type: 0 Jul 03 16:35:02 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.112.14@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 03 16:35:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 03 16:35:02 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.9.112.14@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0cfcff8000/0x5d9ee62e67f9e63c lrc: 3/0,0 mode: PR/PR res: [0x2c002c33b:0x370a:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.112.14@o2ib4 remote: 0x4b3e857652935417 expref: 39772 pid: 23589 timeout: 0 lvb_type: 0 Jul 03 16:36:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jul 03 16:36:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e79f4448-e890-1954-0996-0a25890d8ee5 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d2227000, cur 1562196973 expire 1562196823 last 1562196746 Jul 03 16:36:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 791a2ecd-fea5-54c9-c926-e06f2b6d4ac4 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4538d74800, cur 1562196981 expire 1562196831 last 1562196754 Jul 03 16:36:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 16:36:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 16:36:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 03 16:37:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jul 03 16:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 73304416-dad3-e9c2-af6e-d3b1ea37367d (at 10.9.114.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148a5ff400, cur 1562197035 expire 1562196885 last 1562196808 Jul 03 16:37:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 16:40:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ba9d5bce-9de0-28f1-af07-112093ff61ad (at 10.9.106.51@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25210f5400, cur 1562197207 expire 1562197057 last 1562196980 Jul 03 16:43:27 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562197400/real 1562197400] req@ffff8f1c12ae8c00 x1636723403222368/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562197407 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 16:43:27 fir-md1-s1 kernel: Lustre: 97672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 03 16:43:45 fir-md1-s1 kernel: Lustre: 21483:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f201f396f00 x1633733151575392/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:20/0 lens 376/1600 e 0 to 0 dl 1562197430 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:43:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6) reconnecting Jul 03 16:43:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 03 16:43:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 220a94f1-3873-c0d2-13c3-2a8b3b58132e (at 10.8.29.8@o2ib6) Jul 03 16:43:51 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 03 16:44:06 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 147456 GRANT, real grant 0 Jul 03 16:44:06 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 709 previous similar messages Jul 03 16:44:33 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562197466/real 1562197466] req@ffff8f1e25ad3600 x1636723403347008/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562197473 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 16:44:33 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 03 16:44:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dd15fe9e-fdc9-c67d-748d-ca571be05b29 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b17674400, cur 1562197475 expire 1562197325 last 1562197248 Jul 03 16:44:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 16:44:48 fir-md1-s1 kernel: LustreError: 97661:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e25ad3600 x1636723403376704/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:45:05 fir-md1-s1 kernel: LustreError: 24586:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f222cec9200 x1636723403406320/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:45:16 fir-md1-s1 kernel: LustreError: 24585:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f250a7c3f00 x1636723403423344/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:45:41 fir-md1-s1 kernel: Lustre: 97643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f243d812700 x1633733151608160/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:16/0 lens 480/568 e 0 to 0 dl 1562197546 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:45:59 fir-md1-s1 kernel: LustreError: 21460:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1de0677200 x1636723403515488/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:46:24 fir-md1-s1 kernel: Lustre: 97643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1a8af5f200 x1633733151622352/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:29/0 lens 480/568 e 0 to 0 dl 1562197589 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:47:29 fir-md1-s1 kernel: LustreError: 21460:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562197559, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f20c67ede80/0x5d9ee62e83d4b33a lrc: 3/1,0 mode: --/PR res: [0x200029c11:0xfa:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21460 timeout: 0 lvb_type: 0 Jul 03 16:47:29 fir-md1-s1 kernel: LustreError: 21460:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Jul 03 16:48:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.15.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f24ed68d100/0x5d9ee62e7fcd33d9 lrc: 3/0,0 mode: PW/PW res: [0x200029c11:0xfa:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0xba301179ab01cef5 expref: 922849 pid: 21483 timeout: 1312768 lvb_type: 0 Jul 03 16:49:19 fir-md1-s1 kernel: LNet: Service thread pid 21460 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 03 16:49:19 fir-md1-s1 kernel: Pid: 21460, comm: mdt01_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 03 16:49:19 fir-md1-s1 kernel: Call Trace: Jul 03 16:49:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 03 16:49:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 03 16:49:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 03 16:49:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 03 16:49:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 03 16:49:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562197759.21460 Jul 03 16:49:25 fir-md1-s1 kernel: LustreError: 97671:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1fdf429500 x1636723403988224/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 16:50:02 fir-md1-s1 kernel: Lustre: 20511:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-17), not sending early reply req@ffff8f1df53fe300 x1633733151651008/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:7/0 lens 376/1600 e 0 to 0 dl 1562197807 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 16:50:55 fir-md1-s1 kernel: LustreError: 97671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562197765, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16241e3180/0x5d9ee62e8502b2f9 lrc: 3/0,1 mode: --/EX res: [0x200029c2b:0x351:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97671 timeout: 0 lvb_type: 0 Jul 03 16:51:54 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.15.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f4492e88b40/0x5d9ee62e80adff65 lrc: 3/0,0 mode: PR/PR res: [0x200029c2b:0x34f:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0xba301179ab44a739 expref: 576551 pid: 22287 timeout: 1312974 lvb_type: 0 Jul 03 16:52:45 fir-md1-s1 kernel: LNet: Service thread pid 97664 was inactive for 200.07s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 03 16:52:45 fir-md1-s1 kernel: Pid: 97664, comm: mdt01_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 03 16:52:45 fir-md1-s1 kernel: Call Trace: Jul 03 16:52:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 03 16:52:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 03 16:52:45 fir-md1-s1 kernel: [] mdt_layout_change+0x2a4/0x430 [mdt] Jul 03 16:52:45 fir-md1-s1 kernel: [] mdt_intent_layout+0x7ee/0xcc0 [mdt] Jul 03 16:52:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 03 16:52:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 03 16:52:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 03 16:52:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 03 16:52:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 03 16:52:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 03 16:52:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562197966.97664 Jul 03 16:54:07 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 16:54:07 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 742 previous similar messages Jul 03 16:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6) reconnecting Jul 03 16:54:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 03 16:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 220a94f1-3873-c0d2-13c3-2a8b3b58132e (at 10.8.29.8@o2ib6) Jul 03 16:54:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 03 16:56:23 fir-md1-s1 kernel: LNet: Service thread pid 21460 completed after 624.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 03 16:57:17 fir-md1-s1 kernel: LNet: Service thread pid 97664 completed after 472.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 03 17:04:17 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 17:04:17 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 751 previous similar messages Jul 03 17:09:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 86b912bf-2e5b-c1ac-9553-f5e705cfca02 (at 10.9.106.51@o2ib4) Jul 03 17:09:55 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 03 17:14:18 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 03 17:14:18 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 608 previous similar messages Jul 03 17:22:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 98a67850-1b7c-ef40-1816-b3372d04b91a (at 10.9.104.26@o2ib4) Jul 03 17:22:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 17:24:18 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 17:24:18 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 664 previous similar messages Jul 03 17:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d122243-83ef-341e-1d9e-5ad0fa272beb (at 10.9.114.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252ffd3400, cur 1562200067 expire 1562199917 last 1562199840 Jul 03 17:27:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 17:32:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Jul 03 17:32:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 03 17:34:19 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 17:34:19 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 842 previous similar messages Jul 03 17:44:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 86b912bf-2e5b-c1ac-9553-f5e705cfca02 (at 10.9.106.51@o2ib4) Jul 03 17:44:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 03 17:44:21 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 17:44:21 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 573 previous similar messages Jul 03 17:51:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 03 17:54:28 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 17:54:28 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 743 previous similar messages Jul 03 17:54:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f174f128-4488-2485-c92d-799c5cc7f49d (at 10.9.104.27@o2ib4) Jul 03 17:54:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 03 18:04:47 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:04:47 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 869 previous similar messages Jul 03 18:05:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7cdd6fe1-f6f2-0a49-df73-de49ebbd85ff (at 10.9.101.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd146400, cur 1562202309 expire 1562202159 last 1562202082 Jul 03 18:05:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 18:14:47 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:14:47 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 607 previous similar messages Jul 03 18:24:55 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:24:55 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 763 previous similar messages Jul 03 18:31:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bff84d1e-0a69-b6c4-379f-b22c9974d598 (at 10.9.114.3@o2ib4) Jul 03 18:31:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 03 18:33:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02ea0e3d-c72b-2664-4a33-3841a13fb806 (at 10.9.101.55@o2ib4) Jul 03 18:33:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 18:35:44 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:35:44 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 864 previous similar messages Jul 03 18:46:03 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:46:03 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 638 previous similar messages Jul 03 18:56:12 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 18:56:12 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 811 previous similar messages Jul 03 19:06:21 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:06:21 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 609 previous similar messages Jul 03 19:16:46 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:16:46 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 804 previous similar messages Jul 03 19:27:47 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:27:47 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 887 previous similar messages Jul 03 19:37:58 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:37:58 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 651 previous similar messages Jul 03 19:47:58 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:47:58 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 803 previous similar messages Jul 03 19:52:25 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:52:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:52:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 03 19:52:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 03 19:52:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 03 19:52:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 19:52:36 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:52:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to fba120c3-4cd2-22a3-cb05-96d005aa975a (at 10.8.21.2@o2ib6) Jul 03 19:52:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.2@o2ib6, removing former export from same NID Jul 03 19:52:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.27.19@o2ib6, removing former export from same NID Jul 03 19:52:58 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 03 19:53:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.19@o2ib6, removing former export from same NID Jul 03 19:53:00 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 03 19:53:01 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1562208773/real 0] req@ffff8f1b6671cb00 x1636723431349072/t0(0) o106->fir-MDT0000@10.8.28.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562208781 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 19:53:01 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 03 19:53:04 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:53:04 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 03 19:53:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.25.12@o2ib6, removing former export from same NID Jul 03 19:53:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 03 19:53:05 fir-md1-s1 kernel: LustreError: 44037:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f21c9b4f450 x1635707640078096/t0(0) o3->4ed462a8-ed6a-0891-ced6-ebadfda1f88d@10.8.8.30@o2ib6:27/0 lens 488/440 e 0 to 0 dl 1562208807 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:53:06 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1eb5573a00 Jul 03 19:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 4ed462a8-ed6a-0891-ced6-ebadfda1f88d (at 10.8.8.30@o2ib6), client will retry: rc -110 Jul 03 19:53:08 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22f020dd00 x1638067932108544/t0(0) o101->b041cef5-fff9-4fc6-cc5f-62c5a80e124b@10.9.0.81@o2ib4:13/0 lens 480/568 e 1 to 0 dl 1562208793 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 19:53:08 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 03 19:53:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.10@o2ib6, removing former export from same NID Jul 03 19:53:15 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 03 19:53:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 469d3c01-0ba5-8df1-fade-b379f197d2fe (at 10.8.27.33@o2ib6), client will retry: rc = -110 Jul 03 19:53:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 19:53:27 fir-md1-s1 kernel: Lustre: 22004:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1562208796/real 0] req@ffff8f1e2f262700 x1636723431389552/t0(0) o104->fir-MDT0002@10.8.8.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562208807 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 19:53:27 fir-md1-s1 kernel: Lustre: 22004:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 03 19:53:29 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d1c1daa00 x1633726786217616/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:4/0 lens 480/568 e 0 to 0 dl 1562208814 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 19:53:33 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:53:33 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 03 19:53:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.16@o2ib6, removing former export from same NID Jul 03 19:53:34 fir-md1-s1 kernel: Lustre: Skipped 501 previous similar messages Jul 03 19:53:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to de0940aa-281f-ee72-6d66-43860c09ff15 (at 10.8.17.16@o2ib6) Jul 03 19:53:34 fir-md1-s1 kernel: Lustre: Skipped 718 previous similar messages Jul 03 19:53:43 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2005584850 x1633726786220736/t0(0) o4->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:3/0 lens 488/448 e 0 to 0 dl 1562208843 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:53:43 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 03 19:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c977be3c-f98f-fbec-3aac-245ba5109971 (at 10.8.30.35@o2ib6) reconnecting Jul 03 19:53:46 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 03 19:53:47 fir-md1-s1 kernel: LustreError: 46526:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f2005585450 x1634456217841056/t0(0) o4->b95afc0f-d5ce-0d5e-e5e9-03cd8d169d60@10.8.8.12@o2ib6:17/0 lens 504/448 e 1 to 0 dl 1562208827 ref 1 fl Interpret:/2/0 rc 0/0 Jul 03 19:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b95afc0f-d5ce-0d5e-e5e9-03cd8d169d60 (at 10.8.8.12@o2ib6), client will retry: rc = -110 Jul 03 19:53:50 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:4s); client may timeout. req@ffff8f203b635400 x1634928116944112/t349864063350(0) o101->36c50ebf-42f1-2e51-f789-02d6d7eec692@10.8.8.33@o2ib6:16/0 lens 376/944 e 0 to 0 dl 1562208826 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 19:53:52 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:5s); client may timeout. req@ffff8f20e8993300 x1633726786220496/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:17/0 lens 480/536 e 0 to 0 dl 1562208827 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 19:53:54 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22d5a19e00 Jul 03 19:53:54 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f217734f850 x1634525650098400/t0(0) o4->2ee51d45-426d-bbd9-5b4f-485a0917e8b9@10.8.17.18@o2ib6:21/0 lens 504/448 e 1 to 0 dl 1562208831 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 03 19:53:58 fir-md1-s1 kernel: LustreError: 44037:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f217734bc50 x1633733160145008/t0(0) o4->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:28/0 lens 488/448 e 1 to 0 dl 1562208838 ref 1 fl Interpret:/2/0 rc 0/0 Jul 03 19:53:58 fir-md1-s1 kernel: LustreError: 44037:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 03 19:54:00 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f181c0a8600 Jul 03 19:54:02 fir-md1-s1 kernel: Lustre: 46562:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c43eee050 x1631306790892784/t0(0) o3->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:7/0 lens 488/440 e 0 to 0 dl 1562208847 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 19:54:02 fir-md1-s1 kernel: Lustre: 46562:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Jul 03 19:54:04 fir-md1-s1 kernel: Lustre: 97665:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:11s); client may timeout. req@ffff8f17c3d7dd00 x1633726786220688/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:23/0 lens 480/536 e 0 to 0 dl 1562208833 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 19:54:05 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:54:05 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 03 19:54:05 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f159e9c0000 Jul 03 19:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Jul 03 19:54:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 19:54:05 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d89fd7400 Jul 03 19:54:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.15.6@o2ib6, removing former export from same NID Jul 03 19:54:11 fir-md1-s1 kernel: Lustre: Skipped 350 previous similar messages Jul 03 19:54:14 fir-md1-s1 kernel: Lustre: 97648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1562208847/real 0] req@ffff8f1c368e0600 x1636723431470800/t0(0) o104->fir-MDT0002@10.8.1.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562208854 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 19:54:14 fir-md1-s1 kernel: LustreError: 21865:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1c18195d00 x1637979061666736/t0(0) o37->4dda764c-5ca7-3340-a1d3-17b756c64805@10.8.0.67@o2ib6:14/0 lens 448/440 e 1 to 0 dl 1562208854 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:54:14 fir-md1-s1 kernel: Lustre: 97648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 03 19:54:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f04d00400 Jul 03 19:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 9eed212b-34d9-6e26-f1ac-cdc452decf97 (at 10.8.29.3@o2ib6), client will retry: rc -110 Jul 03 19:54:21 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16630c0a00 Jul 03 19:54:21 fir-md1-s1 kernel: Lustre: 21865:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:7s); client may timeout. req@ffff8f1c18195d00 x1637979061666736/t0(0) o37->4dda764c-5ca7-3340-a1d3-17b756c64805@10.8.0.67@o2ib6:14/0 lens 448/408 e 1 to 0 dl 1562208854 ref 1 fl Complete:/0/0 rc -110/-110 Jul 03 19:54:21 fir-md1-s1 kernel: Lustre: 21865:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 03 19:54:23 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1cca9d3a00 Jul 03 19:54:27 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20c124b400 Jul 03 19:54:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 03 19:54:27 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f195da25200 Jul 03 19:54:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e59365600 Jul 03 19:54:29 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17cc6f9c00 Jul 03 19:54:29 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bff13f800 Jul 03 19:54:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dcbf9c000 Jul 03 19:54:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6), client will retry: rc -110 Jul 03 19:54:33 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 03 19:54:35 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d1d2ede00 Jul 03 19:54:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f9ba66600 Jul 03 19:54:41 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e5926d800 Jul 03 19:54:44 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:21s); client may timeout. req@ffff8f1b6671a100 x1631547122810496/t349864079075(0) o101->a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0@10.8.8.32@o2ib6:23/0 lens 416/944 e 0 to 0 dl 1562208863 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 19:54:44 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 03 19:54:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 56s: evicting client at 10.8.2.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f224e8b69c0/0x5d9ee62ec5bd374f lrc: 3/0,0 mode: PW/PW res: [0x2c002be96:0x4918:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.2.20@o2ib6 remote: 0xbfd4fbab82a26d7b expref: 2292 pid: 97648 timeout: 1323945 lvb_type: 0 Jul 03 19:54:47 fir-md1-s1 kernel: LustreError: 21388:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1f5875bc50 x1631552256687440/t0(0) o4->8167f9b2-58bb-1a00-523a-9433a074fe32@10.8.27.28@o2ib6:17/0 lens 520/456 e 1 to 0 dl 1562208887 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:54:47 fir-md1-s1 kernel: LustreError: 21388:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 03 19:54:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6c3e9cd1-a2aa-e356-67b4-60b86ef1d3c6 (at 10.8.16.6@o2ib6) Jul 03 19:54:49 fir-md1-s1 kernel: Lustre: Skipped 956 previous similar messages Jul 03 19:54:49 fir-md1-s1 kernel: LustreError: 20367:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.2.20@o2ib6 arrived at 1562208889 with bad export cookie 6746082289093297427 Jul 03 19:54:49 fir-md1-s1 kernel: LustreError: 20367:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 2 previous similar messages Jul 03 19:54:49 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a05616000 Jul 03 19:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8167f9b2-58bb-1a00-523a-9433a074fe32 (at 10.8.27.28@o2ib6), client will retry: rc = -110 Jul 03 19:54:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 19:54:51 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f196655e800 Jul 03 19:55:04 fir-md1-s1 kernel: LustreError: 22009:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.2.20@o2ib6 arrived at 1562208904 with bad export cookie 6746082289093297427 Jul 03 19:55:11 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1852ca5050 x1634456217841056/t0(0) o4->b95afc0f-d5ce-0d5e-e5e9-03cd8d169d60@10.8.8.12@o2ib6:16/0 lens 504/448 e 1 to 0 dl 1562208916 ref 2 fl Interpret:/2/0 rc 0/0 Jul 03 19:55:11 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Jul 03 19:55:12 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:55:12 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 17 previous similar messages Jul 03 19:55:16 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ea5afc400 Jul 03 19:55:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bf4434e00 Jul 03 19:55:20 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1852ca3050 x1634525650098400/t0(0) o4->2ee51d45-426d-bbd9-5b4f-485a0917e8b9@10.8.17.18@o2ib6:20/0 lens 504/448 e 1 to 0 dl 1562208920 ref 1 fl Interpret:/2/0 rc 0/0 Jul 03 19:55:20 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 03 19:55:20 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f243d944600 Jul 03 19:55:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24407e5400 Jul 03 19:55:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.20@o2ib6, removing former export from same NID Jul 03 19:55:26 fir-md1-s1 kernel: Lustre: Skipped 1066 previous similar messages Jul 03 19:55:30 fir-md1-s1 kernel: Lustre: 97648:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:5s); client may timeout. req@ffff8f172aad6600 x1631555520463888/t0(0) o101->d36980b7-2b04-f724-0e6b-cf989e4d7da2@10.8.1.34@o2ib6:25/0 lens 480/536 e 0 to 0 dl 1562208925 ref 1 fl Complete:/0/0 rc 0/0 Jul 03 19:55:30 fir-md1-s1 kernel: Lustre: 97648:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Jul 03 19:55:33 fir-md1-s1 kernel: Lustre: 23748:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1562208926/real 0] req@ffff8f2d3aa67200 x1636723431595232/t0(0) o104->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562208933 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 19:55:33 fir-md1-s1 kernel: Lustre: 23748:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jul 03 19:55:51 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f3b8986-88bc-dd5d-4c41-5670b4e69c0b (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0e3253a000, cur 1562208951 expire 1562208801 last 1562208724 Jul 03 19:55:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 03 19:55:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 19:56:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 19:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2c0bfc93-71cb-f565-f1fb-8f804a23ec4c (at 10.8.1.26@o2ib6) reconnecting Jul 03 19:56:20 fir-md1-s1 kernel: Lustre: Skipped 235 previous similar messages Jul 03 19:57:25 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 03 19:57:25 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 24 previous similar messages Jul 03 19:57:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 837c124c-41d9-368d-aae3-f10235137c33 (at 10.8.18.3@o2ib6) Jul 03 19:57:44 fir-md1-s1 kernel: Lustre: Skipped 1032 previous similar messages Jul 03 19:57:47 fir-md1-s1 kernel: Lustre: 22282:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1562209055/real 0] req@ffff8f41b680f500 x1636723431843824/t0(0) o104->fir-MDT0002@10.8.2.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562209066 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 19:57:47 fir-md1-s1 kernel: Lustre: 22282:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Jul 03 19:57:50 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1852ca2c50 x1635707640142560/t0(0) o4->4ed462a8-ed6a-0891-ced6-ebadfda1f88d@10.8.8.30@o2ib6:11/0 lens 488/448 e 0 to 0 dl 1562209091 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:57:50 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 11 previous similar messages Jul 03 19:57:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22c3abd000 Jul 03 19:57:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9dcf2f2b-339d-b96d-0792-e79b27f28969 (at 10.8.28.2@o2ib6), client will retry: rc -110 Jul 03 19:57:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 03 19:57:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16e282ac00 Jul 03 19:58:01 fir-md1-s1 kernel: Lustre: 97643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f16666f3000 x1635707640142496/t0(0) o101->4ed462a8-ed6a-0891-ced6-ebadfda1f88d@10.8.8.30@o2ib6:6/0 lens 376/976 e 0 to 0 dl 1562209086 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 19:58:01 fir-md1-s1 kernel: Lustre: 97643:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Jul 03 19:58:03 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b7c255a00 Jul 03 19:58:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4ed462a8-ed6a-0891-ced6-ebadfda1f88d (at 10.8.8.30@o2ib6), client will retry: rc = -110 Jul 03 19:58:03 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 03 19:58:10 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1852ca3c50 x1635200122088912/t0(0) o3->018b4088-9100-7f5b-2709-38dd7f461ac7@10.8.8.29@o2ib6:10/0 lens 488/440 e 1 to 0 dl 1562209090 ref 1 fl Interpret:/0/0 rc 0/0 Jul 03 19:58:10 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 03 19:58:19 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e470e5a00 Jul 03 19:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 03 19:58:20 fir-md1-s1 kernel: Lustre: 46532:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f1852ca7c50 x1631583824738768/t0(0) o3->aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26@10.8.16.4@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1562209097 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 03 19:58:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 44s: evicting client at 10.8.8.30@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f346d2a3600/0x5d9ee62ec6a33ef9 lrc: 4/0,0 mode: EX/EX res: [0x2c002bedb:0xeec5:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.8.8.30@o2ib6 remote: 0x44bc588de19b9b76 expref: 14320 pid: 97643 timeout: 1324160 lvb_type: 3 Jul 03 19:58:21 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a142ea600 Jul 03 19:58:23 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 19:58:23 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 643 previous similar messages Jul 03 19:58:23 fir-md1-s1 kernel: LustreError: 83752:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f06f27ef800 x1636723432054432/t0(0) o105->fir-MDT0002@10.8.8.30@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 03 19:58:23 fir-md1-s1 kernel: LustreError: 83752:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 03 19:58:24 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1885204000 Jul 03 19:58:25 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d007c1200 Jul 03 19:58:25 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21c382dc00 Jul 03 19:58:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16627d1200 Jul 03 19:58:27 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a31af6800 Jul 03 19:58:27 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a9fcffc00 Jul 03 19:58:29 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d1c1da200 Jul 03 19:58:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20cb0c3600 Jul 03 19:58:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17fa734c00 Jul 03 19:58:30 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20de015a00 Jul 03 19:58:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20d73fd800 Jul 03 19:58:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e341d8000 Jul 03 19:58:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f251f04fc00 Jul 03 19:58:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20f3471800 Jul 03 19:58:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f173e89b200 Jul 03 19:58:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.9@o2ib6, removing former export from same NID Jul 03 19:58:41 fir-md1-s1 kernel: Lustre: Skipped 309 previous similar messages Jul 03 19:58:41 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2048977200 Jul 03 19:58:49 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f23a926b200 Jul 03 19:58:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 19:58:53 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2318d27200 Jul 03 19:58:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 03 19:58:53 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 03 19:59:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 20:00:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.9.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 03 20:08:44 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 20:08:44 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 797 previous similar messages Jul 03 20:19:03 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 20:19:03 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 832 previous similar messages Jul 03 20:29:10 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 20:29:10 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 623 previous similar messages Jul 03 20:40:56 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 03 20:40:56 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 735 previous similar messages Jul 03 20:51:02 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 20:51:02 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 805 previous similar messages Jul 03 21:01:05 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:01:05 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 843 previous similar messages Jul 03 21:11:12 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:11:12 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 623 previous similar messages Jul 03 21:21:15 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:21:15 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 807 previous similar messages Jul 03 21:32:27 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:32:27 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 724 previous similar messages Jul 03 21:42:48 fir-md1-s1 kernel: LustreError: 21289:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:42:48 fir-md1-s1 kernel: LustreError: 21289:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 795 previous similar messages Jul 03 21:52:52 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 21:52:52 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 853 previous similar messages Jul 03 22:03:15 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:03:15 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 610 previous similar messages Jul 03 22:13:19 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:13:19 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 859 previous similar messages Jul 03 22:22:44 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562217757/real 1562217757] req@ffff8f1fca084200 x1636723449609552/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562217764 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 03 22:22:44 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Jul 03 22:22:52 fir-md1-s1 kernel: Lustre: 26255:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19e8a6cb00 x1631600744376720/t0(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:27/0 lens 496/448 e 1 to 0 dl 1562217777 ref 2 fl Interpret:/0/0 rc 0/0 Jul 03 22:22:52 fir-md1-s1 kernel: Lustre: 26255:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 03 22:22:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6) reconnecting Jul 03 22:22:58 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 03 22:22:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6e32fe6b-eec6-274e-37cd-da661cf9bf17 (at 10.8.29.1@o2ib6) Jul 03 22:22:58 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 03 22:23:19 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562217792/real 1562217792] req@ffff8f1fca084200 x1636723449609552/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562217799 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 22:23:19 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 03 22:23:33 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:23:33 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 619 previous similar messages Jul 03 22:23:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6) reconnecting Jul 03 22:23:40 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 03 22:23:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6e32fe6b-eec6-274e-37cd-da661cf9bf17 (at 10.8.29.1@o2ib6) Jul 03 22:23:40 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 03 22:24:07 fir-md1-s1 kernel: LustreError: 97650:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562217757, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1d73512d00/0x5d9ee62f0ba095df lrc: 3/1,0 mode: --/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97650 timeout: 0 lvb_type: 0 Jul 03 22:24:07 fir-md1-s1 kernel: LustreError: 97650:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 03 22:24:08 fir-md1-s1 kernel: LustreError: 20460:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562217758, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1c3d2a8000/0x5d9ee62f0ba69689 lrc: 3/1,0 mode: --/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20460 timeout: 0 lvb_type: 0 Jul 03 22:24:11 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562217761, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2531dfad00/0x5d9ee62f0bb92624 lrc: 3/1,0 mode: --/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97643 timeout: 0 lvb_type: 0 Jul 03 22:24:17 fir-md1-s1 kernel: LustreError: 23567:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562217767, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f051862b180/0x5d9ee62f0bd2a6b5 lrc: 3/1,0 mode: --/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23567 timeout: 0 lvb_type: 0 Jul 03 22:24:17 fir-md1-s1 kernel: LustreError: 23567:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 03 22:24:25 fir-md1-s1 kernel: LustreError: 20728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562217775, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f159eea0fc0/0x5d9ee62f0bfab832 lrc: 3/1,0 mode: --/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20728 timeout: 0 lvb_type: 0 Jul 03 22:24:25 fir-md1-s1 kernel: LustreError: 20728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jul 03 22:24:29 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562217862/real 1562217862] req@ffff8f1fca084200 x1636723449609552/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562217869 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 03 22:24:29 fir-md1-s1 kernel: Lustre: 50446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 03 22:24:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting Jul 03 22:24:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 03 22:24:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 8651a829-1584-35b1-6264-26a8d5433bb6 (at 10.9.113.3@o2ib4) Jul 03 22:24:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 03 22:25:11 fir-md1-s1 kernel: LustreError: 50446:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.6@o2ib6) failed to reply to blocking AST (req@ffff8f1fca084200 x1636723449609552 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f17bdf31440/0x5d9ee62f0acb4901 lrc: 4/0,0 mode: PR/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0x71e36d96c02791d expref: 16638 pid: 21460 timeout: 1333113 lvb_type: 0 Jul 03 22:25:11 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.15.6@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 03 22:25:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.15.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f17bdf31440/0x5d9ee62f0acb4901 lrc: 3/0,0 mode: PR/PR res: [0x2000297d4:0x4a2:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0x71e36d96c02791d expref: 16639 pid: 21460 timeout: 0 lvb_type: 0 Jul 03 22:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bde0b95a-d079-f6a9-2817-38a3e98f4627 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1615f91c00, cur 1562217966 expire 1562217816 last 1562217739 Jul 03 22:33:56 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:33:56 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 800 previous similar messages Jul 03 22:44:04 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:44:04 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 634 previous similar messages Jul 03 22:54:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 22:54:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 782 previous similar messages Jul 03 23:04:30 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:04:30 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 673 previous similar messages Jul 03 23:14:33 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:14:33 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 802 previous similar messages Jul 03 23:24:39 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:24:39 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 623 previous similar messages Jul 03 23:34:39 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:34:39 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 793 previous similar messages Jul 03 23:39:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5f4bee65-bf6b-ad1e-3c5b-2158af12057b (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25219b3800, cur 1562222387 expire 1562222237 last 1562222160 Jul 03 23:39:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 03 23:44:44 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:44:44 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 556 previous similar messages Jul 03 23:54:44 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 03 23:54:44 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 856 previous similar messages Jul 04 00:04:59 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:04:59 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 562 previous similar messages Jul 04 00:15:03 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:15:03 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 860 previous similar messages Jul 04 00:25:16 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:25:16 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 565 previous similar messages Jul 04 00:36:17 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:36:17 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 873 previous similar messages Jul 04 00:46:39 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:46:39 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 644 previous similar messages Jul 04 00:56:45 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 00:56:45 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 817 previous similar messages Jul 04 01:07:05 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:07:05 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 604 previous similar messages Jul 04 01:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 07c1712c-9739-2dce-4883-ed8d604a7bd1 (at 10.8.15.3@o2ib6) reconnecting Jul 04 01:14:02 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 04 01:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 420c129b-df9e-b1c5-eae5-667fed64bb9d (at 10.8.15.3@o2ib6) Jul 04 01:14:02 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 04 01:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 07c1712c-9739-2dce-4883-ed8d604a7bd1 (at 10.8.15.3@o2ib6) reconnecting Jul 04 01:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 420c129b-df9e-b1c5-eae5-667fed64bb9d (at 10.8.15.3@o2ib6) Jul 04 01:14:45 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f189e7a2a00 x1631537913284992/t0(0) o101->d3013375-2e90-b76e-c4d8-76867f2b4a32@10.8.2.20@o2ib6:20/0 lens 480/568 e 1 to 0 dl 1562228090 ref 2 fl Interpret:/0/0 rc 0/0 Jul 04 01:14:45 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Jul 04 01:17:09 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:17:09 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 794 previous similar messages Jul 04 01:27:24 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:27:24 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 565 previous similar messages Jul 04 01:37:38 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:37:38 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 808 previous similar messages Jul 04 01:48:41 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:48:41 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 923 previous similar messages Jul 04 01:58:53 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 01:58:53 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 645 previous similar messages Jul 04 02:09:10 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 02:09:10 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 853 previous similar messages Jul 04 02:19:16 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 02:19:16 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 617 previous similar messages Jul 04 02:29:34 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 02:29:34 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 806 previous similar messages Jul 04 02:39:58 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 02:39:58 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 615 previous similar messages Jul 04 02:50:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 02:50:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 859 previous similar messages Jul 04 03:00:24 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:00:24 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 606 previous similar messages Jul 04 03:11:19 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:11:19 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 884 previous similar messages Jul 04 03:21:26 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:21:26 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 653 previous similar messages Jul 04 03:31:27 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:31:27 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 583 previous similar messages Jul 04 03:41:51 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:41:51 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 868 previous similar messages Jul 04 03:51:53 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 03:51:53 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 629 previous similar messages Jul 04 04:02:09 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 04:02:09 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 835 previous similar messages Jul 04 04:12:27 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 04:12:27 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 617 previous similar messages Jul 04 04:22:30 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 04:22:30 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 819 previous similar messages Jul 04 04:32:31 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 04:32:31 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jul 04 04:44:07 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 04:44:07 fir-md1-s1 kernel: LustreError: 21451:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 738 previous similar messages Jul 04 04:54:15 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 04:54:15 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 797 previous similar messages Jul 04 05:04:17 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 05:04:17 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 649 previous similar messages Jul 04 05:14:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 05:14:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 792 previous similar messages Jul 04 05:24:45 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 05:24:45 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 621 previous similar messages Jul 04 05:35:23 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 05:35:23 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 833 previous similar messages Jul 04 05:45:30 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 05:45:30 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 661 previous similar messages Jul 04 05:57:14 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 04 05:57:14 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 780 previous similar messages Jul 04 06:07:21 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:07:21 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 793 previous similar messages Jul 04 06:17:42 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:17:42 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 594 previous similar messages Jul 04 06:28:08 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:28:08 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 849 previous similar messages Jul 04 06:38:32 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:38:32 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 600 previous similar messages Jul 04 06:48:56 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:48:56 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 622 previous similar messages Jul 04 06:59:00 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 06:59:00 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 846 previous similar messages Jul 04 07:09:08 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:09:08 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 569 previous similar messages Jul 04 07:19:17 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:19:17 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 764 previous similar messages Jul 04 07:29:30 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:29:30 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 691 previous similar messages Jul 04 07:39:31 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:39:31 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 572 previous similar messages Jul 04 07:49:37 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:49:37 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 552 previous similar messages Jul 04 07:59:51 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 07:59:51 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 617 previous similar messages Jul 04 08:09:54 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 08:09:54 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 625 previous similar messages Jul 04 08:20:33 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 04 08:20:33 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 713 previous similar messages Jul 04 08:32:34 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 08:32:34 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 730 previous similar messages Jul 04 08:44:17 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 08:44:17 fir-md1-s1 kernel: LustreError: 46555:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 727 previous similar messages Jul 04 08:56:06 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 08:56:06 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 727 previous similar messages Jul 04 08:59:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 25bbf676-f42f-a624-a39b-ff8deef07eff (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2528bcb800, cur 1562255942 expire 1562255792 last 1562255715 Jul 04 08:59:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 08:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 25bbf676-f42f-a624-a39b-ff8deef07eff (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ceb2b4800, cur 1562255947 expire 1562255797 last 1562255720 Jul 04 08:59:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 04 08:59:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Jul 04 09:06:23 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:06:23 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 769 previous similar messages Jul 04 09:17:13 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:17:13 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 659 previous similar messages Jul 04 09:27:17 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:27:17 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 643 previous similar messages Jul 04 09:37:41 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:37:41 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 567 previous similar messages Jul 04 09:49:02 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:49:02 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 718 previous similar messages Jul 04 09:59:04 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 09:59:04 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 797 previous similar messages Jul 04 10:09:23 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 10:09:23 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 591 previous similar messages Jul 04 10:19:44 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 10:19:44 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 711 previous similar messages Jul 04 10:37:51 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 10:37:51 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 720 previous similar messages Jul 04 10:47:52 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 10:47:52 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 456 previous similar messages Jul 04 10:57:55 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 10:57:55 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 384 previous similar messages Jul 04 11:10:11 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 11:10:11 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 197 previous similar messages Jul 04 11:20:12 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 11:20:12 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 403 previous similar messages Jul 04 11:30:16 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 11:30:16 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 367 previous similar messages Jul 04 11:40:17 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 11:40:17 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 386 previous similar messages Jul 04 11:50:39 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 11:50:39 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jul 04 12:00:51 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 12:00:51 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 385 previous similar messages Jul 04 12:08:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8ef25a02-5cd5-8500-774d-d75ea76eaffd (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3f5ea65400, cur 1562267302 expire 1562267152 last 1562267075 Jul 04 12:10:59 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 12:10:59 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 356 previous similar messages Jul 04 12:21:00 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 12:21:00 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 696 previous similar messages Jul 04 12:31:08 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 12:31:08 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 418 previous similar messages Jul 04 12:41:10 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 04 12:41:10 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 336 previous similar messages Jul 04 12:51:23 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 12:51:23 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 04 13:02:58 fir-md1-s1 kernel: LustreError: 69435:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 13:02:58 fir-md1-s1 kernel: LustreError: 69435:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 331 previous similar messages Jul 04 13:13:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 13:13:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 267 previous similar messages Jul 04 13:23:19 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 13:23:19 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 364 previous similar messages Jul 04 13:33:19 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 13:33:19 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 452 previous similar messages Jul 04 13:43:26 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 13:43:26 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 461 previous similar messages Jul 04 13:53:26 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 13:53:26 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 487 previous similar messages Jul 04 14:03:29 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 14:03:29 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 659 previous similar messages Jul 04 14:13:33 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 14:13:33 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 445 previous similar messages Jul 04 14:23:34 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 14:23:34 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 460 previous similar messages Jul 04 14:33:35 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 14:33:35 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jul 04 14:43:40 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 14:43:40 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 466 previous similar messages Jul 04 14:53:44 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 14:53:44 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 543 previous similar messages Jul 04 15:03:52 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 15:03:52 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 513 previous similar messages Jul 04 15:13:52 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 15:13:52 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 477 previous similar messages Jul 04 15:23:53 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 15:23:53 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 460 previous similar messages Jul 04 15:33:59 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 15:33:59 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 402 previous similar messages Jul 04 15:44:05 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 15:44:05 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 506 previous similar messages Jul 04 15:46:02 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 15:54:05 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 15:54:05 fir-md1-s1 kernel: LustreError: 21537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 422 previous similar messages Jul 04 16:02:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jul 04 16:02:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 16:04:10 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 16:04:10 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 404 previous similar messages Jul 04 16:14:19 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 16:14:19 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 372 previous similar messages Jul 04 16:24:22 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 16:24:22 fir-md1-s1 kernel: LustreError: 21714:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 377 previous similar messages Jul 04 16:34:29 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 04 16:34:29 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 538 previous similar messages Jul 04 16:41:13 fir-md1-s1 kernel: Lustre: 10589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 16:44:34 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 16:44:34 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 389 previous similar messages Jul 04 16:46:58 fir-md1-s1 kernel: Lustre: 23625:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 16:46:58 fir-md1-s1 kernel: Lustre: 23625:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jul 04 16:54:37 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 16:54:37 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 450 previous similar messages Jul 04 17:04:38 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 17:04:38 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 528 previous similar messages Jul 04 17:12:26 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:12:26 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jul 04 17:14:43 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 17:14:43 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 518 previous similar messages Jul 04 17:24:37 fir-md1-s1 kernel: Lustre: 23561:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:25:00 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 17:25:00 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jul 04 17:26:18 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:26:18 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 04 17:29:41 fir-md1-s1 kernel: Lustre: 23623:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:35:01 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 17:35:01 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 515 previous similar messages Jul 04 17:38:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.104.69@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3c5a713180/0x5d9ee6305c19a6f9 lrc: 3/0,0 mode: PR/PR res: [0x2c002c23d:0x1c859:0x0].0x0 bits 0x58/0x0 rrc: 3 type: IBT flags: 0x60200400010020 nid: 10.9.104.69@o2ib4 remote: 0xc50f1e8ca834a497 expref: 6755 pid: 23731 timeout: 1402183 lvb_type: 0 Jul 04 17:38:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8c412c26-542a-dae3-c537-fda210938013 (at 10.9.104.69@o2ib4) Jul 04 17:38:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 17:43:51 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:43:51 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Jul 04 17:45:17 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 17:45:17 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 485 previous similar messages Jul 04 17:45:47 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:45:47 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jul 04 17:47:38 fir-md1-s1 kernel: Lustre: 23623:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 17:55:17 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 17:55:17 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 521 previous similar messages Jul 04 18:05:20 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 18:05:20 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 498 previous similar messages Jul 04 18:13:49 fir-md1-s1 kernel: Lustre: 10589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:15:24 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 18:15:24 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 428 previous similar messages Jul 04 18:16:09 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:16:09 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages Jul 04 18:16:28 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:23:26 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:23:26 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 91 previous similar messages Jul 04 18:23:50 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:23:50 fir-md1-s1 kernel: Lustre: 21370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Jul 04 18:24:15 fir-md1-s1 kernel: Lustre: 23653:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 18:24:15 fir-md1-s1 kernel: Lustre: 23653:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages Jul 04 18:25:40 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 18:25:40 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 566 previous similar messages Jul 04 18:35:42 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 18:35:42 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 530 previous similar messages Jul 04 18:45:42 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 18:45:42 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 561 previous similar messages Jul 04 18:55:50 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 18:55:50 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 500 previous similar messages Jul 04 19:05:55 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 19:05:55 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 471 previous similar messages Jul 04 19:15:58 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 19:15:58 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 437 previous similar messages Jul 04 19:26:24 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 19:26:24 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 348 previous similar messages Jul 04 19:36:24 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 19:36:24 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 394 previous similar messages Jul 04 19:46:32 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 19:46:32 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 373 previous similar messages Jul 04 19:52:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1f29c7a2-d2d3-0a98-27b0-578e87d088ab (at 10.8.9.2@o2ib6) Jul 04 19:56:36 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 19:56:36 fir-md1-s1 kernel: LustreError: 46572:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 429 previous similar messages Jul 04 20:06:52 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 20:06:52 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 432 previous similar messages Jul 04 20:16:57 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 20:16:57 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 490 previous similar messages Jul 04 20:26:58 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 20:26:58 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 672 previous similar messages Jul 04 20:37:00 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 04 20:37:00 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 635 previous similar messages Jul 04 20:47:03 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 20:47:03 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 708 previous similar messages Jul 04 20:57:04 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 04 20:57:04 fir-md1-s1 kernel: LustreError: 21245:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 719 previous similar messages Jul 04 21:07:05 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 04 21:07:05 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 610 previous similar messages Jul 04 21:17:09 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 69632 GRANT, real grant 0 Jul 04 21:17:09 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 723 previous similar messages Jul 04 21:24:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 721f53d4-652b-e945-12ff-35ccdf15e929 (at 10.9.114.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14c7772800, cur 1562300654 expire 1562300504 last 1562300427 Jul 04 21:24:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 21:27:11 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 21:27:11 fir-md1-s1 kernel: LustreError: 23106:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 708 previous similar messages Jul 04 21:37:12 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 21:37:12 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 763 previous similar messages Jul 04 21:47:13 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 21:47:13 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 740 previous similar messages Jul 04 21:47:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 721f53d4-652b-e945-12ff-35ccdf15e929 (at 10.9.114.15@o2ib4) Jul 04 21:47:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 21:57:15 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 21:57:15 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 631 previous similar messages Jul 04 22:07:19 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 22:07:19 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 319 previous similar messages Jul 04 22:17:20 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 22:17:20 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 293 previous similar messages Jul 04 22:27:24 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 22:27:24 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 238 previous similar messages Jul 04 22:37:35 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 22:37:35 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 235 previous similar messages Jul 04 22:44:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6afdb7fb-fdc2-6692-1bb2-94fc70f0b6ac (at 10.9.104.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252253d800, cur 1562305458 expire 1562305308 last 1562305231 Jul 04 22:47:47 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 22:47:47 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 209 previous similar messages Jul 04 22:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d4117728-4cc7-9876-91f7-8a96129f589f (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1656e72400, cur 1562306166 expire 1562306016 last 1562305939 Jul 04 22:56:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 22:57:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jul 04 22:57:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 22:57:51 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 22:57:51 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 317 previous similar messages Jul 04 23:08:31 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 23:08:31 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 160 previous similar messages Jul 04 23:14:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c6ff68eb-5fb8-a120-f19a-506df7ae12c5 (at 10.9.104.72@o2ib4) Jul 04 23:14:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 04 23:16:45 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 04 23:16:45 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Jul 04 23:18:35 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 23:18:35 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 301 previous similar messages Jul 04 23:28:36 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 23:28:36 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 321 previous similar messages Jul 04 23:38:46 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 04 23:38:46 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 304 previous similar messages Jul 04 23:49:09 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 23:49:09 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jul 04 23:59:11 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 04 23:59:11 fir-md1-s1 kernel: LustreError: 25635:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 260 previous similar messages Jul 05 00:09:15 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:09:15 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 248 previous similar messages Jul 05 00:19:16 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:19:16 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 249 previous similar messages Jul 05 00:29:17 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:29:17 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 223 previous similar messages Jul 05 00:39:18 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:39:18 fir-md1-s1 kernel: LustreError: 21539:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 294 previous similar messages Jul 05 00:49:30 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:49:30 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 235 previous similar messages Jul 05 00:59:34 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 00:59:34 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 284 previous similar messages Jul 05 01:09:37 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 01:09:37 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 297 previous similar messages Jul 05 01:20:05 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 01:20:05 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 316 previous similar messages Jul 05 01:30:06 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 01:30:06 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 306 previous similar messages Jul 05 01:40:11 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 01:40:11 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 275 previous similar messages Jul 05 01:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a859b16-85f7-35c9-387f-f10b0648c129 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25019d5000, cur 1562316520 expire 1562316370 last 1562316293 Jul 05 01:48:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 01:49:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jul 05 01:49:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 01:50:39 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 01:50:39 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 290 previous similar messages Jul 05 02:00:40 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:00:40 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jul 05 02:10:40 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:10:40 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 330 previous similar messages Jul 05 02:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3c3d09fd-cece-8f71-77c9-8f6f333d8d68 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f13f73fe000, cur 1562318259 expire 1562318109 last 1562318032 Jul 05 02:17:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 02:18:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c6748fa-faf9-dbf4-7576-e7e488da698d (at 10.8.11.9@o2ib6) Jul 05 02:18:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 02:20:42 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:20:42 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 330 previous similar messages Jul 05 02:30:44 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:30:44 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jul 05 02:40:46 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:40:46 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 269 previous similar messages Jul 05 02:50:51 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 02:50:51 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 224 previous similar messages Jul 05 03:00:56 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:00:56 fir-md1-s1 kernel: LustreError: 46520:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 308 previous similar messages Jul 05 03:07:02 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 05 03:11:04 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:11:04 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 205 previous similar messages Jul 05 03:21:05 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:21:05 fir-md1-s1 kernel: LustreError: 46543:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 196 previous similar messages Jul 05 03:31:06 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:31:06 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jul 05 03:35:41 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 05 03:41:09 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:41:09 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 179 previous similar messages Jul 05 03:51:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 03:51:16 fir-md1-s1 kernel: LustreError: 21686:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 224 previous similar messages Jul 05 04:01:28 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 04:01:28 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 242 previous similar messages Jul 05 04:11:36 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 04:11:36 fir-md1-s1 kernel: LustreError: 21567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 233 previous similar messages Jul 05 04:16:57 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 05 04:16:57 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 33 previous similar messages Jul 05 04:17:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 065e2ad2-1d60-8b4d-b554-7a4284d83236 (at 10.8.1.7@o2ib6) reconnecting Jul 05 04:17:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6be5edeb-cbb9-a4d7-5f1b-a3072b83c552 (at 10.8.1.7@o2ib6) Jul 05 04:17:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 04:21:41 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 04:21:41 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 309 previous similar messages Jul 05 04:31:52 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 04:31:52 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 219 previous similar messages Jul 05 04:36:29 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 05 04:36:29 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 05 04:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 691e4f7c-24cc-f758-5354-96c1b01f1439 (at 10.8.7.7@o2ib6) reconnecting Jul 05 04:36:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 05 04:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 41d886bd-dfcd-3155-cafa-8df75781f2df (at 10.8.7.7@o2ib6) Jul 05 04:36:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 05 04:40:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 05 04:40:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 05 04:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e45eae18-7cf5-c24e-ada4-411d043e0647 (at 10.8.7.19@o2ib6) reconnecting Jul 05 04:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6fb1a9aa-6234-c00b-63b2-a1a72639773f (at 10.8.7.19@o2ib6) Jul 05 04:41:54 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 04:41:54 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 351 previous similar messages Jul 05 04:52:06 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 04:52:06 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 266 previous similar messages Jul 05 04:57:38 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 05 05:02:10 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:02:10 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 343 previous similar messages Jul 05 05:12:11 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:12:11 fir-md1-s1 kernel: LustreError: 27583:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 330 previous similar messages Jul 05 05:22:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:22:19 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 297 previous similar messages Jul 05 05:32:20 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:32:20 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 308 previous similar messages Jul 05 05:42:24 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:42:24 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 314 previous similar messages Jul 05 05:52:39 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 05:52:39 fir-md1-s1 kernel: LustreError: 70067:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 321 previous similar messages Jul 05 06:02:42 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 05 06:02:42 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 313 previous similar messages Jul 05 06:12:57 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 06:12:57 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 319 previous similar messages Jul 05 06:22:59 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 05 06:22:59 fir-md1-s1 kernel: LustreError: 46591:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 335 previous similar messages Jul 05 06:33:08 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 06:33:08 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 262 previous similar messages Jul 05 06:39:29 fir-md1-s1 kernel: Lustre: 23634:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 05 06:43:11 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 06:43:11 fir-md1-s1 kernel: LustreError: 22428:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 303 previous similar messages Jul 05 06:53:16 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 06:53:16 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 05 07:03:20 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 07:03:20 fir-md1-s1 kernel: LustreError: 22958:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 257 previous similar messages Jul 05 07:04:54 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 05 07:13:24 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 07:13:24 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 260 previous similar messages Jul 05 07:23:24 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 07:23:24 fir-md1-s1 kernel: LustreError: 46521:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 295 previous similar messages Jul 05 07:33:24 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 07:33:24 fir-md1-s1 kernel: LustreError: 22990:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 371 previous similar messages Jul 05 07:43:29 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 05 07:43:29 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 326 previous similar messages Jul 05 07:49:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bec3d6e3-cbf4-befd-5ab3-86401c925d46 (at 10.9.0.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ed637400, cur 1562338190 expire 1562338040 last 1562337963 Jul 05 07:49:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 07:49:50 fir-md1-s1 kernel: LustreError: 20384:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0784263c00 x1636724756545744/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 05 07:53:30 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 07:53:30 fir-md1-s1 kernel: LustreError: 21484:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 05 07:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c4e03c7e-ac09-b1ad-c42c-11e5ce21ec84 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1937e75800, cur 1562338532 expire 1562338382 last 1562338305 Jul 05 07:55:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:00:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6726fecc-3078-ba4a-fb68-64e928250f1f (at 10.9.102.31@o2ib4) Jul 05 08:00:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 08:03:32 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 08:03:32 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 325 previous similar messages Jul 05 08:13:34 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 08:13:34 fir-md1-s1 kernel: LustreError: 22989:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 327 previous similar messages Jul 05 08:18:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 29e229ef-0b7d-e0ce-48dd-1c614dad7928 (at 10.9.112.15@o2ib4) Jul 05 08:18:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:19:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 05 08:19:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:19:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e79f4448-e890-1954-0996-0a25890d8ee5 (at 10.9.112.14@o2ib4) Jul 05 08:19:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 08:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 05 08:19:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:20:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2b30f37f-5bb9-7326-9800-1fc222ceb47c (at 10.9.106.61@o2ib4) Jul 05 08:21:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Jul 05 08:21:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:22:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cffa9ca6-4860-be91-20b9-abd21a031d37 (at 10.9.108.4@o2ib4) Jul 05 08:22:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 05 08:23:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Jul 05 08:23:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 05 08:23:38 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 08:23:38 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 305 previous similar messages Jul 05 08:25:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8b986bcb-1a7e-3434-c2fb-c6a130bf7611 (at 10.9.104.25@o2ib4) Jul 05 08:25:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:33:43 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 08:33:43 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 355 previous similar messages Jul 05 08:41:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7ab2f51d-a689-9f2c-be74-3bf003bf5840 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f233c7fe800, cur 1562341269 expire 1562341119 last 1562341042 Jul 05 08:41:09 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 05 08:43:48 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 08:43:48 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 357 previous similar messages Jul 05 08:50:36 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f2665e59-4b86-9898-62f9-cc1d6be44c9d (at 10.9.101.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3be27ca000, cur 1562341836 expire 1562341686 last 1562341609 Jul 05 08:50:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 08:50:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 03b80d11-11fc-47d1-78d0-c1090191edd3 (at 10.9.101.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2844712c00, cur 1562341842 expire 1562341692 last 1562341615 Jul 05 08:50:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 08:51:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02ea0e3d-c72b-2664-4a33-3841a13fb806 (at 10.9.101.55@o2ib4) Jul 05 08:51:02 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 05 08:53:59 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 08:53:59 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 298 previous similar messages Jul 05 09:04:01 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 09:04:01 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 270 previous similar messages Jul 05 09:14:09 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 09:14:09 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 335 previous similar messages Jul 05 09:17:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jul 05 09:17:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 09:24:10 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 09:24:10 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 270 previous similar messages Jul 05 09:34:12 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 09:34:12 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 227 previous similar messages Jul 05 09:40:02 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562344795/real 1562344795] req@ffff8f1a7b7d8300 x1636724902610976/t0(0) o106->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562344802 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 05 09:40:02 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 05 09:40:10 fir-md1-s1 kernel: Lustre: 21446:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a480fa400 x1634291378161312/t0(0) o101->9081d826-2f83-5b46-ff73-7e6473184838@10.8.17.25@o2ib6:15/0 lens 480/568 e 1 to 0 dl 1562344815 ref 2 fl Interpret:/0/0 rc 0/0 Jul 05 09:40:23 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562344816/real 1562344816] req@ffff8f1a7b7d8300 x1636724902610976/t0(0) o106->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562344823 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 09:40:23 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 05 09:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9081d826-2f83-5b46-ff73-7e6473184838 (at 10.8.17.25@o2ib6) reconnecting Jul 05 09:40:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 09:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 955a011e-49f5-8ef4-d629-f5f3f5327d18 (at 10.8.17.25@o2ib6) Jul 05 09:40:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 09:40:59 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562344851/real 1562344851] req@ffff8f1a7b7d8300 x1636724902610976/t0(0) o106->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562344858 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 09:40:59 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 05 09:42:09 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562344922/real 1562344922] req@ffff8f1a7b7d8300 x1636724902610976/t0(0) o106->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562344929 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 09:42:09 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 05 09:42:21 fir-md1-s1 kernel: Lustre: 50444:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23715eb000 x1634186985277856/t0(0) o101->195f63e6-6435-e156-0d15-900ee8f39a3e@10.9.109.53@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1562344946 ref 2 fl Interpret:/0/0 rc 0/0 Jul 05 09:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 195f63e6-6435-e156-0d15-900ee8f39a3e (at 10.9.109.53@o2ib4) reconnecting Jul 05 09:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0c108c2d-a344-eca5-b660-99391625b78d (at 10.9.109.53@o2ib4) Jul 05 09:42:41 fir-md1-s1 kernel: LustreError: 50447:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.0.62@o2ib4) failed to reply to blocking AST (req@ffff8f1c47f2ec00 x1636724903797024 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f199ebafbc0/0x5d9ee631ae8958f4 lrc: 4/0,0 mode: PR/PR res: [0x2c002c05f:0xe02b:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.9.0.62@o2ib4 remote: 0x33d88ec1c734c2ba expref: 105515 pid: 23662 timeout: 1460043 lvb_type: 0 Jul 05 09:42:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.0.62@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 05 09:42:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f199ebafbc0/0x5d9ee631ae8958f4 lrc: 3/0,0 mode: PR/PR res: [0x2c002c05f:0xe02b:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.9.0.62@o2ib4 remote: 0x33d88ec1c734c2ba expref: 105516 pid: 23662 timeout: 0 lvb_type: 0 Jul 05 09:42:41 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (79:87s); client may timeout. req@ffff8f1a480fa400 x1634291378161312/t0(0) o101->9081d826-2f83-5b46-ff73-7e6473184838@10.8.17.25@o2ib6:15/0 lens 480/536 e 1 to 0 dl 1562344874 ref 1 fl Complete:/0/0 rc 301/301 Jul 05 09:42:41 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jul 05 09:42:58 fir-md1-s1 kernel: LustreError: 23737:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f3b4bffb300 x1636724904486656/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 05 09:43:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148ebd6000, cur 1562345002 expire 1562344852 last 1562344775 Jul 05 09:43:23 fir-md1-s1 kernel: Lustre: 23630:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3a5dcf7200 x1634186985352544/t0(0) o101->195f63e6-6435-e156-0d15-900ee8f39a3e@10.9.109.53@o2ib4:28/0 lens 480/568 e 0 to 0 dl 1562345008 ref 2 fl Interpret:/0/0 rc 0/0 Jul 05 09:43:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3cd15e44-adf1-e977-3310-908c278e7f22 (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f39887c00, cur 1562345007 expire 1562344857 last 1562344780 Jul 05 09:43:27 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 05 09:43:27 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f195805d340/0x5d9ee631af012926 lrc: 3/0,0 mode: PR/PR res: [0x2c002c05f:0xe042:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x60000400000020 nid: 10.9.0.62@o2ib4 remote: 0x33d88ec1c745cfe2 expref: 7391 pid: 27316 timeout: 1460067 lvb_type: 0 Jul 05 09:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 195f63e6-6435-e156-0d15-900ee8f39a3e (at 10.9.109.53@o2ib4) reconnecting Jul 05 09:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0c108c2d-a344-eca5-b660-99391625b78d (at 10.9.109.53@o2ib4) Jul 05 09:44:14 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 09:44:14 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 278 previous similar messages Jul 05 09:54:17 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 09:54:17 fir-md1-s1 kernel: LustreError: 46551:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 196 previous similar messages Jul 05 10:04:51 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 10:04:51 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 224 previous similar messages Jul 05 10:15:01 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 10:15:01 fir-md1-s1 kernel: LustreError: 20499:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 165 previous similar messages Jul 05 10:17:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jul 05 10:25:02 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 10:25:02 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 169 previous similar messages Jul 05 10:26:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jul 05 10:26:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 10:35:03 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 10:35:03 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 287 previous similar messages Jul 05 10:42:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bac6cd4e-a755-0f0e-da6d-e2c740eb12ce (at 10.9.114.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f05034fd800, cur 1562348573 expire 1562348423 last 1562348346 Jul 05 10:45:05 fir-md1-s1 kernel: LustreError: 21616:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 10:45:05 fir-md1-s1 kernel: LustreError: 21616:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 174 previous similar messages Jul 05 10:55:09 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 10:55:09 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 213 previous similar messages Jul 05 11:05:20 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:05:20 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jul 05 11:08:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 721f53d4-652b-e945-12ff-35ccdf15e929 (at 10.9.114.15@o2ib4) Jul 05 11:08:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 11:15:26 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:15:26 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 254 previous similar messages Jul 05 11:25:33 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:25:33 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 314 previous similar messages Jul 05 11:35:34 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:35:34 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 338 previous similar messages Jul 05 11:45:36 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:45:36 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 314 previous similar messages Jul 05 11:55:44 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 11:55:44 fir-md1-s1 kernel: LustreError: 46530:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 295 previous similar messages Jul 05 12:05:54 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 12:05:54 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 305 previous similar messages Jul 05 12:15:58 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 12:15:58 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 289 previous similar messages Jul 05 12:26:01 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 12:26:01 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 289 previous similar messages Jul 05 12:36:02 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 12:36:02 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 230 previous similar messages Jul 05 12:46:04 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 12:46:04 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 263 previous similar messages Jul 05 12:56:07 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 12:56:07 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 263 previous similar messages Jul 05 13:06:24 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 05 13:06:24 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 277 previous similar messages Jul 05 13:16:25 fir-md1-s1 kernel: LustreError: 81718:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 13:16:25 fir-md1-s1 kernel: LustreError: 81718:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 256 previous similar messages Jul 05 13:26:27 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 13:26:27 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 247 previous similar messages Jul 05 13:30:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9bcb994b-3f25-af85-c843-3a1243f52dea (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2097efec00, cur 1562358623 expire 1562358473 last 1562358396 Jul 05 13:30:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 13:30:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to acd26ab4-a020-fbc0-1a40-f0e7d759131f (at 10.8.23.14@o2ib6) Jul 05 13:30:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 13:36:32 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 13:36:32 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 238 previous similar messages Jul 05 13:46:52 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 13:46:52 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 186 previous similar messages Jul 05 13:56:56 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 13:56:56 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 224 previous similar messages Jul 05 14:07:09 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:07:09 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 264 previous similar messages Jul 05 14:17:15 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:17:15 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 233 previous similar messages Jul 05 14:25:47 fir-md1-s1 kernel: Lustre: 23571:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:25:48 fir-md1-s1 kernel: Lustre: 23554:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:25:50 fir-md1-s1 kernel: Lustre: 23554:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:25:53 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:25:53 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 1 previous similar message Jul 05 14:25:58 fir-md1-s1 kernel: Lustre: 23561:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:25:58 fir-md1-s1 kernel: Lustre: 23561:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 2 previous similar messages Jul 05 14:26:06 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:26:06 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 4 previous similar messages Jul 05 14:26:24 fir-md1-s1 kernel: Lustre: 23660:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:26:24 fir-md1-s1 kernel: Lustre: 23660:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 9 previous similar messages Jul 05 14:26:56 fir-md1-s1 kernel: Lustre: 21417:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:26:56 fir-md1-s1 kernel: Lustre: 21417:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 18 previous similar messages Jul 05 14:27:20 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:27:20 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 190 previous similar messages Jul 05 14:28:02 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:28:02 fir-md1-s1 kernel: Lustre: 23649:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 37 previous similar messages Jul 05 14:30:11 fir-md1-s1 kernel: Lustre: 21411:0:(llog_cat.c:874:llog_cat_process_or_fork()) fir-MDD0002: catlog [0x5:0xa:0x0] crosses index zero Jul 05 14:30:11 fir-md1-s1 kernel: Lustre: 21411:0:(llog_cat.c:874:llog_cat_process_or_fork()) Skipped 71 previous similar messages Jul 05 14:37:20 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:37:20 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 350 previous similar messages Jul 05 14:47:34 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:47:34 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 256 previous similar messages Jul 05 14:57:39 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 14:57:39 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 332 previous similar messages Jul 05 15:07:58 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:07:58 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 325 previous similar messages Jul 05 15:17:58 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:17:58 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 289 previous similar messages Jul 05 15:27:59 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:27:59 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 315 previous similar messages Jul 05 15:38:03 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:38:03 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 197 previous similar messages Jul 05 15:48:16 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:48:16 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 299 previous similar messages Jul 05 15:53:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07f1d5f5-28d8-ec0b-6253-6164c1e142a5 (at 10.9.107.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14ee00c800, cur 1562367230 expire 1562367080 last 1562367003 Jul 05 15:53:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 15:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07f1d5f5-28d8-ec0b-6253-6164c1e142a5 (at 10.9.107.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc809c00, cur 1562367245 expire 1562367095 last 1562367018 Jul 05 15:58:17 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 15:58:17 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 300 previous similar messages Jul 05 16:08:33 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:08:33 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 221 previous similar messages Jul 05 16:12:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4f15da91-4546-507e-8c99-9e08b5e219a4 (at 10.8.15.10@o2ib6) Jul 05 16:12:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 16:18:47 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:18:47 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 193 previous similar messages Jul 05 16:28:53 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:28:53 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 337 previous similar messages Jul 05 16:38:53 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:38:53 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jul 05 16:49:00 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:49:00 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 198 previous similar messages Jul 05 16:59:02 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 16:59:02 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jul 05 17:09:02 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:09:02 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 286 previous similar messages Jul 05 17:19:14 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:19:14 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 153 previous similar messages Jul 05 17:29:15 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:29:15 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 282 previous similar messages Jul 05 17:39:43 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:39:43 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 322 previous similar messages Jul 05 17:49:43 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:49:43 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 325 previous similar messages Jul 05 17:59:45 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 17:59:45 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 309 previous similar messages Jul 05 18:09:46 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 18:09:46 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 345 previous similar messages Jul 05 18:20:33 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 18:20:33 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 314 previous similar messages Jul 05 18:30:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 18:30:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 270 previous similar messages Jul 05 18:40:43 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 18:40:43 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 262 previous similar messages Jul 05 18:50:47 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 18:50:47 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 221 previous similar messages Jul 05 19:00:48 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 05 19:00:48 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 291 previous similar messages Jul 05 19:11:11 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 19:11:11 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 225 previous similar messages Jul 05 19:21:13 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 19:21:13 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jul 05 19:31:15 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 19:31:15 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 220 previous similar messages Jul 05 19:36:16 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562380569/real 1562380569] req@ffff8f1f19f2ec00 x1636725341747536/t0(0) o104->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562380576 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 05 19:36:16 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 05 19:36:24 fir-md1-s1 kernel: Lustre: 97638:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2229669200 x1631551722727968/t412995623177(0) o36->78ab2c22-394d-bdd4-0b8e-3553d6a47e28@10.8.17.2@o2ib6:29/0 lens 488/3152 e 1 to 0 dl 1562380589 ref 2 fl Interpret:/0/0 rc 0/0 Jul 05 19:36:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 78ab2c22-394d-bdd4-0b8e-3553d6a47e28 (at 10.8.17.2@o2ib6) reconnecting Jul 05 19:36:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9b9b8332-39fb-197d-4c4c-38d36ae981cd (at 10.8.17.2@o2ib6) Jul 05 19:36:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 19:36:36 fir-md1-s1 kernel: Lustre: 23702:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562380589/real 1562380589] req@ffff8f41cca51b00 x1636725341976576/t0(0) o106->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562380596 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 05 19:36:36 fir-md1-s1 kernel: Lustre: 23702:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 05 19:36:44 fir-md1-s1 kernel: LustreError: 97656:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.0.66@o2ib6) failed to reply to blocking AST (req@ffff8f1f19f2ec00 x1636725341747536 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f17a7279440/0x5d9ee6328c624984 lrc: 4/0,0 mode: PR/PR res: [0x200029c29:0x17d:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xffcd23c129cbc86f expref: 2404 pid: 21434 timeout: 1495686 lvb_type: 0 Jul 05 19:36:44 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.0.66@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 05 19:36:44 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.0.66@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f17a7279440/0x5d9ee6328c624984 lrc: 3/0,0 mode: PR/PR res: [0x200029c29:0x17d:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xffcd23c129cbc86f expref: 2405 pid: 21434 timeout: 0 lvb_type: 0 Jul 05 19:36:44 fir-md1-s1 kernel: LustreError: 24582:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1fbe695d00 x1636725342147392/t0(0) o104->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 05 19:39:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 08dbb8a3-6486-471a-a832-58e0c151a878 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2438cf0400, cur 1562380779 expire 1562380629 last 1562380552 Jul 05 19:39:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 19:41:15 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 19:41:15 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 247 previous similar messages Jul 05 19:51:23 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 19:51:23 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 200 previous similar messages Jul 05 20:01:33 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 20:01:33 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 221 previous similar messages Jul 05 20:03:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 048975c6-ab6c-4dc2-089d-bee623fa3e4d (at 10.9.114.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2505b8d000, cur 1562382211 expire 1562382061 last 1562381984 Jul 05 20:03:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 20:03:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 048975c6-ab6c-4dc2-089d-bee623fa3e4d (at 10.9.114.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1477697c00, cur 1562382212 expire 1562382062 last 1562381985 Jul 05 20:03:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 05 20:11:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 20:11:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 228 previous similar messages Jul 05 20:21:40 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 20:21:40 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 274 previous similar messages Jul 05 20:28:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.113.10@o2ib4) Jul 05 20:29:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to beb38144-d000-b47c-bba7-ccce9e6df4a5 (at 10.9.114.10@o2ib4) Jul 05 20:29:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 20:31:41 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 20:31:41 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 235 previous similar messages Jul 05 20:41:43 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 20:41:43 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 350 previous similar messages Jul 05 20:51:46 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 20:51:46 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 286 previous similar messages Jul 05 21:01:47 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 21:01:47 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 246 previous similar messages Jul 05 21:11:53 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 21:11:53 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 334 previous similar messages Jul 05 21:22:06 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 21:22:06 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 307 previous similar messages Jul 05 21:32:16 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 21:32:16 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 228 previous similar messages Jul 05 21:42:17 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 21:42:17 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 273 previous similar messages Jul 05 21:51:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cc538e45-b702-a36c-5f06-e62f44bf19d0 (at 10.8.17.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f450665fc00, cur 1562388698 expire 1562388548 last 1562388471 Jul 05 21:51:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 05 21:52:20 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 21:52:20 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 265 previous similar messages Jul 05 22:02:23 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:02:23 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 293 previous similar messages Jul 05 22:12:29 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:12:29 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 165 previous similar messages Jul 05 22:22:35 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:22:35 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 251 previous similar messages Jul 05 22:32:36 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:32:36 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 276 previous similar messages Jul 05 22:42:36 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:42:36 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 283 previous similar messages Jul 05 22:52:45 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 22:52:45 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 230 previous similar messages Jul 05 23:02:47 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 23:02:47 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 287 previous similar messages Jul 05 23:12:50 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 23:12:50 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 241 previous similar messages Jul 05 23:22:54 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 23:22:54 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 360 previous similar messages Jul 05 23:32:59 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 23:32:59 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 261 previous similar messages Jul 05 23:43:00 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 05 23:43:00 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 311 previous similar messages Jul 05 23:49:47 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395780/real 1562395780] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395787 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 05 23:49:47 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 05 23:49:54 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395787/real 1562395787] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395794 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 23:49:55 fir-md1-s1 kernel: Lustre: 21417:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a72f19e00 x1637014291926288/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:0/0 lens 520/568 e 1 to 0 dl 1562395800 ref 2 fl Interpret:/0/0 rc 0/0 Jul 05 23:50:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:50:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:50:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 23:50:08 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395801/real 1562395801] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395808 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 23:50:08 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 05 23:50:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:50:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:50:29 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395822/real 1562395822] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395829 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 23:50:29 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 05 23:50:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:50:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:51:04 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395857/real 1562395857] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395864 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 23:51:04 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 05 23:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:51:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:51:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6ee172d9-72a9-7fa2-230d-3850214207fa (at 10.0.10.3@o2ib7) reconnecting Jul 05 23:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Jul 05 23:52:14 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562395927/real 1562395927] req@ffff8f12b6453000 x1636725417782288/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562395934 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 05 23:52:14 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 05 23:52:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 579568b0-fc84-54e9-66a5-a75bc316659b (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22e81e0000, cur 1562395943 expire 1562395793 last 1562395716 Jul 05 23:52:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 05 23:53:07 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 05 23:53:07 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 315 previous similar messages Jul 06 00:03:17 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 00:03:17 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 301 previous similar messages Jul 06 00:13:19 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 00:13:19 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 248 previous similar messages Jul 06 00:23:23 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 00:23:23 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 231 previous similar messages Jul 06 00:33:30 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 00:33:30 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 227 previous similar messages Jul 06 00:43:59 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 00:43:59 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 06 00:54:00 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 00:54:00 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 213 previous similar messages Jul 06 01:04:01 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:04:01 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 225 previous similar messages Jul 06 01:14:19 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:14:19 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 216 previous similar messages Jul 06 01:24:22 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:24:22 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 306 previous similar messages Jul 06 01:34:29 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:34:29 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 248 previous similar messages Jul 06 01:44:30 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:44:30 fir-md1-s1 kernel: LustreError: 27605:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 275 previous similar messages Jul 06 01:54:47 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 01:54:47 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 218 previous similar messages Jul 06 02:04:50 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 02:04:50 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 294 previous similar messages Jul 06 02:14:54 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 06 02:14:54 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 06 02:25:21 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 02:25:21 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 350 previous similar messages Jul 06 02:35:22 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 02:35:22 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 335 previous similar messages Jul 06 02:45:30 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 02:45:30 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 333 previous similar messages Jul 06 02:55:37 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 02:55:37 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 277 previous similar messages Jul 06 03:06:06 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 03:06:06 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 291 previous similar messages Jul 06 03:16:07 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 03:16:07 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 289 previous similar messages Jul 06 03:26:31 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 06 03:26:31 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 248 previous similar messages Jul 06 03:36:35 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 03:36:35 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 282 previous similar messages Jul 06 03:46:38 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 03:46:38 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 254 previous similar messages Jul 06 03:56:40 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 03:56:40 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 303 previous similar messages Jul 06 04:06:59 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 04:06:59 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 272 previous similar messages Jul 06 04:15:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d7baa7ce-5705-6e23-2846-5c2b64fab1c8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f172772fc00, cur 1562411703 expire 1562411553 last 1562411476 Jul 06 04:15:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 04:15:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jul 06 04:17:02 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 06 04:17:02 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 218 previous similar messages Jul 06 04:27:34 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 04:27:34 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 239 previous similar messages Jul 06 04:37:40 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 04:37:40 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 245 previous similar messages Jul 06 04:47:41 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 04:47:41 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 170 previous similar messages Jul 06 04:57:46 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 04:57:46 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 219 previous similar messages Jul 06 05:07:47 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:07:47 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 254 previous similar messages Jul 06 05:17:52 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:17:52 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 376 previous similar messages Jul 06 05:28:00 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:28:00 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 204 previous similar messages Jul 06 05:38:17 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:38:17 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 368 previous similar messages Jul 06 05:48:20 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:48:20 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 353 previous similar messages Jul 06 05:58:21 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 05:58:21 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 260 previous similar messages Jul 06 06:08:22 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 06:08:22 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 290 previous similar messages Jul 06 06:18:26 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 06:18:26 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 319 previous similar messages Jul 06 06:28:32 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 06:28:32 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 287 previous similar messages Jul 06 06:38:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 06:38:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 268 previous similar messages Jul 06 06:48:43 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 06:48:43 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 256 previous similar messages Jul 06 06:58:46 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 06:58:46 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 268 previous similar messages Jul 06 07:08:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 07:08:55 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 256 previous similar messages Jul 06 07:19:04 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 07:19:04 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 213 previous similar messages Jul 06 07:29:19 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 07:29:19 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 239 previous similar messages Jul 06 07:39:24 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 07:39:24 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 237 previous similar messages Jul 06 07:49:49 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 06 07:49:49 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 236 previous similar messages Jul 06 07:58:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1cdcf44c-092e-67dd-29a2-3cb7e9bc7e29 (at 10.8.15.6@o2ib6) Jul 06 07:58:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 08:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ea8d1cad-7733-1759-3045-271c39c8bfa7 (at 10.9.114.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a8b77e400, cur 1562425246 expire 1562425096 last 1562425019 Jul 06 08:00:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 08:01:06 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:01:06 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 284 previous similar messages Jul 06 08:11:13 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:11:13 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 319 previous similar messages Jul 06 08:22:05 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:22:05 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 263 previous similar messages Jul 06 08:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9efc11a2-2302-21f2-1382-b7d75650f9a7 (at 10.9.113.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f13b5844800, cur 1562426534 expire 1562426384 last 1562426307 Jul 06 08:22:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 08:27:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to beb38144-d000-b47c-bba7-ccce9e6df4a5 (at 10.9.114.10@o2ib4) Jul 06 08:27:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 08:32:05 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:32:05 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 279 previous similar messages Jul 06 08:42:11 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:42:11 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 269 previous similar messages Jul 06 08:47:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.113.10@o2ib4) Jul 06 08:47:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 08:53:24 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 08:53:24 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 326 previous similar messages Jul 06 09:03:25 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 09:03:25 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 244 previous similar messages Jul 06 09:04:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jul 06 09:04:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 09:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d6c51075-12c4-bfee-f317-56a8e3a97c90 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e48a01000, cur 1562429294 expire 1562429144 last 1562429067 Jul 06 09:08:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 09:08:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jul 06 09:08:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 09:13:25 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 09:13:25 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 232 previous similar messages Jul 06 09:23:28 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 94208 GRANT, real grant 0 Jul 06 09:23:28 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 357 previous similar messages Jul 06 09:33:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 09:33:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 212 previous similar messages Jul 06 09:44:10 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 09:44:10 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 176 previous similar messages Jul 06 09:54:14 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 09:54:14 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 204 previous similar messages Jul 06 10:04:16 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:04:16 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 286 previous similar messages Jul 06 10:14:58 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:14:58 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 196 previous similar messages Jul 06 10:25:46 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:25:46 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 221 previous similar messages Jul 06 10:35:50 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:35:50 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 254 previous similar messages Jul 06 10:46:34 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:46:34 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 232 previous similar messages Jul 06 10:56:42 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 10:56:42 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 260 previous similar messages Jul 06 11:06:42 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:06:42 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 179 previous similar messages Jul 06 11:16:50 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:16:50 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 264 previous similar messages Jul 06 11:27:14 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:27:14 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 243 previous similar messages Jul 06 11:37:22 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:37:22 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 207 previous similar messages Jul 06 11:47:25 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:47:25 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 191 previous similar messages Jul 06 11:57:50 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 11:57:50 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 191 previous similar messages Jul 06 12:08:35 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 12:08:35 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 161 previous similar messages Jul 06 12:18:39 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 12:18:39 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 237 previous similar messages Jul 06 12:28:57 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 12:28:57 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 110 previous similar messages Jul 06 12:38:57 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 12:38:57 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 98 previous similar messages Jul 06 12:49:12 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 12:49:12 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 131 previous similar messages Jul 06 13:00:10 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 13:00:10 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 74 previous similar messages Jul 06 13:10:13 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 06 13:10:13 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 74 previous similar messages Jul 06 13:20:52 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 06 13:20:52 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 120 previous similar messages Jul 06 13:31:02 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 13:31:02 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 67 previous similar messages Jul 06 13:41:07 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 06 13:41:07 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 58 previous similar messages Jul 06 13:55:11 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 06 13:55:11 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 36 previous similar messages Jul 06 14:06:37 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 06 14:06:37 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 06 14:19:59 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 32768 GRANT, real grant 0 Jul 06 14:19:59 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 06 14:31:12 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 06 14:31:12 fir-md1-s1 kernel: LustreError: 42895:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 14:46:42 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 14:46:42 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 06 14:57:09 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 14:57:09 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 25 previous similar messages Jul 06 15:07:44 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 15:07:44 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 15:34:05 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 15:34:05 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 15:38:35 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 06 15:38:35 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 06 15:43:15 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 15:43:15 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 06 15:49:20 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 15:49:20 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 06 16:00:03 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:00:03 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 06 16:10:43 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:10:43 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 21 previous similar messages Jul 06 16:20:47 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:20:47 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 16:30:50 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:30:50 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 06 16:41:45 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:41:45 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 16:51:53 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 16:51:53 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 06 17:01:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:01:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 17:12:18 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:12:18 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jul 06 17:23:13 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:23:13 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 06 17:33:30 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:33:30 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 06 17:43:42 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:43:42 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 06 17:53:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 17:53:49 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 18:04:37 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:04:37 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 18:14:48 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:14:48 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 18:25:03 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:25:03 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 06 18:35:48 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:35:48 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 18:44:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 43d4db84-f4df-a5c1-f438-2ed5ad3ddb7d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2429a3e800, cur 1562463857 expire 1562463707 last 1562463630 Jul 06 18:44:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 18:44:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Jul 06 18:44:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 06 18:46:23 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:46:23 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 18:57:07 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 18:57:07 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 19:07:28 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:07:28 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 06 19:17:53 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:17:53 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jul 06 19:28:28 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:28:28 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 19:38:52 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:38:52 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 19:49:16 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:49:16 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 19:59:35 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 19:59:35 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 20:10:08 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 20:10:08 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 20:20:41 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 20:20:41 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 20:31:15 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 20:31:15 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 20:42:23 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 20:42:23 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 20:52:40 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 20:52:40 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 06 21:03:01 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:03:01 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 06 21:13:58 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:13:58 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 06 21:25:11 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:25:11 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 06 21:35:34 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:35:34 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 06 21:46:48 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:46:48 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 06 21:57:13 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 21:57:13 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 9 previous similar messages Jul 06 22:07:16 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 22:07:16 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 22:17:40 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 22:17:40 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 06 22:28:29 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 22:28:29 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 06 22:39:05 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 22:39:05 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 06 22:50:10 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 22:50:10 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 06 23:00:45 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:00:45 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 06 23:10:59 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:10:59 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 06 23:21:44 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:21:44 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 06 23:32:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:32:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 06 23:42:35 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:42:35 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 11 previous similar messages Jul 06 23:53:21 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 06 23:53:21 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 15 previous similar messages Jul 07 00:03:41 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:03:41 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 07 00:14:01 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:14:01 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 07 00:24:21 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:24:21 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 07 00:35:14 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:35:14 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 19 previous similar messages Jul 07 00:46:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:46:06 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jul 07 00:56:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 00:56:10 fir-md1-s1 kernel: LustreError: 20503:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 21 previous similar messages Jul 07 01:18:25 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 01:18:25 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 10 previous similar messages Jul 07 01:20:42 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 01:20:42 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 01:32:20 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 01:46:29 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 02:20:09 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 02:20:09 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 93 previous similar messages Jul 07 02:21:36 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 02:32:04 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 02:41:01 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 02:41:01 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 03:17:44 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 03:25:04 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 03:40:19 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 03:44:04 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 03:44:46 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 03:44:46 fir-md1-s1 kernel: LustreError: 46526:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 31 previous similar messages Jul 07 03:46:37 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 03:46:37 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 20 previous similar messages Jul 07 03:50:25 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 03:50:25 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 04:39:04 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 04:39:04 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 06:56:47 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 07:09:12 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:09:42 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:09:47 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:09:47 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 07:10:17 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:10:17 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 07:10:22 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:11:02 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:11:02 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 07:11:32 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:11:32 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 07 07:12:35 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:12:35 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 07 07:14:21 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:14:21 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 07 07:17:17 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:17:17 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 07 07:22:32 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:22:32 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 07 07:32:45 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:32:45 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 41 previous similar messages Jul 07 07:43:04 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:43:04 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 50 previous similar messages Jul 07 07:53:06 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 07:53:06 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 44 previous similar messages Jul 07 08:03:23 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:03:23 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 50 previous similar messages Jul 07 08:13:40 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:13:40 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 48 previous similar messages Jul 07 08:23:58 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:23:58 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 44 previous similar messages Jul 07 08:34:16 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:34:16 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jul 07 08:44:24 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:44:24 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 45 previous similar messages Jul 07 08:54:52 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 08:54:52 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 47 previous similar messages Jul 07 09:05:09 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 09:05:09 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jul 07 09:15:20 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 07 09:15:20 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 41 previous similar messages Jul 07 09:25:49 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 07 09:25:49 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 44 previous similar messages Jul 07 09:35:51 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 09:35:51 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 45 previous similar messages Jul 07 09:46:08 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 09:46:08 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 44 previous similar messages Jul 07 09:56:35 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 09:56:35 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 52 previous similar messages Jul 07 10:06:39 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:06:39 fir-md1-s1 kernel: LustreError: 21389:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 45 previous similar messages Jul 07 10:16:56 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:16:56 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 49 previous similar messages Jul 07 10:26:57 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:26:57 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 43 previous similar messages Jul 07 10:37:14 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:37:14 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 45 previous similar messages Jul 07 10:47:21 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:47:21 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 48 previous similar messages Jul 07 10:57:28 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 07 10:57:28 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 33 previous similar messages Jul 07 11:26:39 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 11:26:39 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 07 11:53:26 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 11:58:47 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 11:58:47 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 12:04:29 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 12:07:51 fir-md1-s1 kernel: LustreError: 22649:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 12:11:33 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 12:21:55 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 13:06:31 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 13:08:04 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 13:15:26 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 13:21:32 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 13:32:23 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 13:48:00 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 13:48:00 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 07 14:22:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 14:22:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 14:24:16 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 14:30:51 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 14:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3903bb43-6d23-19dc-ccc3-5eecafcff35a (at 10.8.1.36@o2ib6) reconnecting Jul 07 14:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6cf8bc2f-bf0f-5ecb-1a1d-10eb0db43353 (at 10.8.1.36@o2ib6) Jul 07 14:55:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 14:56:45 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 15:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 15:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 15:01:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 07 15:01:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 15:12:42 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 15:23:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 15:23:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 15:23:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 15:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 15:39:46 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 07 15:48:19 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 16:29:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jul 07 16:39:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 16:39:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 16:39:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 17:01:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 17:01:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 17:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 17:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 17:08:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 17:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 17:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 17:20:34 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 17:21:31 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 17:50:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 17:50:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 17:50:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 17:50:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 17:56:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 17:56:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 18:02:17 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:02:20 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 18:02:27 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:02:40 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:02:54 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:03:11 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 18:03:30 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:03:30 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 18:04:11 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:04:11 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 07 18:05:22 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:05:22 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8 previous similar messages Jul 07 18:07:36 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:07:36 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 16 previous similar messages Jul 07 18:11:53 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:11:53 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 29 previous similar messages Jul 07 18:20:31 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:20:31 fir-md1-s1 kernel: LustreError: 27481:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 65 previous similar messages Jul 07 18:30:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 18:30:33 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 77 previous similar messages Jul 07 18:40:41 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 155648 GRANT, real grant 0 Jul 07 18:40:41 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 72 previous similar messages Jul 07 19:08:47 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 32768 GRANT, real grant 0 Jul 07 19:08:47 fir-md1-s1 kernel: LustreError: 46576:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 54 previous similar messages Jul 07 19:52:37 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 20:32:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 20:32:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 20:33:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 617d800a-afeb-08ed-bb4c-9f77025769ad (at 10.8.25.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2507968000, cur 1562556788 expire 1562556638 last 1562556561 Jul 07 20:33:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 20:34:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 374fd2d9-2972-20b7-dfa4-bf6b2470cf36 (at 10.8.1.6@o2ib6) reconnecting Jul 07 20:34:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.6@o2ib6, removing former export from same NID Jul 07 20:34:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 263eaecf-e81f-64c0-76c4-67b409a3186f (at 10.8.1.6@o2ib6) Jul 07 20:37:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 20:37:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 20:37:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 20:37:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 20:38:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 20:38:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 20:38:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 20:41:29 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 07 20:41:29 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 20:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 20:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 21:07:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 21:07:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 21:23:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0d321477-e1a4-6634-93cf-b59d753ff98f (at 10.8.18.6@o2ib6) reconnecting Jul 07 21:23:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b1f9ccc8-925b-b4d2-9293-aac9aa183623 (at 10.8.18.6@o2ib6) Jul 07 21:40:22 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:40:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:40:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 21:40:28 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:40:28 fir-md1-s1 kernel: LustreError: 46532:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 07 21:40:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:40:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 07 21:40:35 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:40:35 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 07 21:40:56 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 07 21:40:56 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 07 21:42:00 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:42:00 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 26 previous similar messages Jul 07 21:42:57 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 07 21:42:57 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 32 previous similar messages Jul 07 21:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 730be893-31e8-983c-06e1-f426e82a434b (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20e2733400, cur 1562561065 expire 1562560915 last 1562560838 Jul 07 21:44:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 21:44:33 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:44:33 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 13 previous similar messages Jul 07 21:46:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jul 07 21:54:15 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 07 21:54:15 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 22:21:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.17@o2ib6, removing former export from same NID Jul 07 22:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7f5b8d8c-996c-1887-f76d-12c3566ba896 (at 10.8.1.17@o2ib6) reconnecting Jul 07 22:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to caccb606-7559-916b-0433-b661c183f103 (at 10.8.1.17@o2ib6) Jul 07 22:21:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 22:21:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1d02240d-6817-4f2d-eb33-71d0a2e61934 (at 10.8.18.3@o2ib6) reconnecting Jul 07 22:21:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 837c124c-41d9-368d-aae3-f10235137c33 (at 10.8.18.3@o2ib6) Jul 07 22:21:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 22:21:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.3@o2ib6, removing former export from same NID Jul 07 22:22:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 22:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1d02240d-6817-4f2d-eb33-71d0a2e61934 (at 10.8.18.3@o2ib6) reconnecting Jul 07 22:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 837c124c-41d9-368d-aae3-f10235137c33 (at 10.8.18.3@o2ib6) Jul 07 22:22:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 22:22:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 22:33:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 22:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 22:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 22:33:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 22:34:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 22:34:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 22:34:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 22:47:24 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 22:55:05 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 23:00:36 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:00:36 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 23:08:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:08:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:08:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:09:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:09:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:16:57 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 07 23:18:11 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:21:11 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:22:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:22:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:22:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:22:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:22:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 23:22:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:22:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:22:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:22:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:23:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:26:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:26:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:27:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:27:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9eb449c2-e54f-1e34-81bc-f024b214ecc1 (at 10.9.114.3@o2ib4) reconnecting Jul 07 23:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to bff84d1e-0a69-b6c4-379f-b22c9974d598 (at 10.9.114.3@o2ib4) Jul 07 23:33:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:33:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:33:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:33:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:34:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:34:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:34:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:34:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 23:35:19 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:35:19 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 23:36:46 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:37:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 07 23:37:30 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 07 23:38:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:38:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:38:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:38:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:38:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 07 23:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 23:38:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:38:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:38:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:38:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 23:51:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:51:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:51:44 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 07 23:51:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 07 23:52:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 07 23:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 07 23:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 07 23:52:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 07 23:52:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 00:06:43 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 00:06:43 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 08 00:14:34 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 00:25:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0d321477-e1a4-6634-93cf-b59d753ff98f (at 10.8.18.6@o2ib6) reconnecting Jul 08 00:25:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b1f9ccc8-925b-b4d2-9293-aac9aa183623 (at 10.8.18.6@o2ib6) Jul 08 00:25:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 00:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 00:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 00:52:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.6@o2ib6, removing former export from same NID Jul 08 00:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 374fd2d9-2972-20b7-dfa4-bf6b2470cf36 (at 10.8.1.6@o2ib6) reconnecting Jul 08 00:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 263eaecf-e81f-64c0-76c4-67b409a3186f (at 10.8.1.6@o2ib6) Jul 08 00:53:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.6@o2ib6, removing former export from same NID Jul 08 00:53:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 263eaecf-e81f-64c0-76c4-67b409a3186f (at 10.8.1.6@o2ib6) Jul 08 00:53:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:03:22 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 01:07:25 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:07:25 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 08 01:07:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:07:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 01:07:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:07:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:08:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 01:08:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 01:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:08:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 01:08:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:08:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 01:08:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:09:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 01:09:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 01:09:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:09:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 01:09:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:09:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 01:09:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:09:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:09:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:10:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:10:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:10:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:10:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:13:02 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:13:15 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 01:18:58 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:22:23 fir-md1-s1 kernel: LustreError: 46581:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 01:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 01:25:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 01:29:17 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:30:53 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:36:29 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 01:36:29 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 4 previous similar messages Jul 08 02:01:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 02:01:49 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 02:03:44 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 02:12:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 02:12:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 02:14:22 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 02:14:22 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 08 02:19:39 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 02:26:26 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 02:26:26 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 08 02:27:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 02:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 02:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 02:27:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 02:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 02:28:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 02:28:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 02:28:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 02:28:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 02:49:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 02:49:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 02:49:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 03:23:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 03:23:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:23:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:24:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 03:24:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 03:24:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:24:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 03:24:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 03:24:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:24:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 03:26:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:26:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:26:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 03:29:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:29:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:29:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 03:29:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:29:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:38:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:38:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:38:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 03:38:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:38:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:46:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 03:46:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 03:46:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 03:46:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 03:46:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 04:22:59 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 04:22:59 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 08 04:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:26:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 08 04:26:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 04:26:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 04:27:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 04:27:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 04:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:27:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 04:28:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:28:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:28:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 04:28:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 04:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:32:48 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 04:32:48 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 08 04:33:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 04:33:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:39:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 04:39:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:39:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 04:39:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 04:39:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:15:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 05:15:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:15:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 05:15:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 08 05:15:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:15:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 05:15:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 05:16:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 05:16:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:16:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:16:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 05:16:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 08 05:17:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:17:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:17:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 05:17:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 05:17:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 05:17:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:17:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 08 05:18:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 05:18:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:18:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 05:18:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:20:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 05:20:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 08 05:20:38 fir-md1-s1 kernel: LustreError: 55546:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1f53e44850 x1631562238067040/t0(0) o256->9d0e62c0-e368-6db8-c860-d1e71d1366bc@10.8.17.11@o2ib6:13/0 lens 304/240 e 0 to 0 dl 1562588443 ref 1 fl Interpret:/0/0 rc 0/0 Jul 08 05:20:38 fir-md1-s1 kernel: LustreError: 55546:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 08 05:21:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 05:21:02 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 08 05:29:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) reconnecting Jul 08 05:29:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Jul 08 05:29:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 05:29:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 05:29:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 07:42:13 fir-md1-s1 kernel: LustreError: 21535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 07:52:02 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 155648 GRANT, real grant 0 Jul 08 07:52:02 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 08 08:10:13 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 08 08:19:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26320709-561f-90ed-6684-fea46854b319 (at 10.8.1.29@o2ib6) Jul 08 08:19:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 08:48:16 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 08 08:48:51 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli c9c3f7fc-2b8d-1a18-fd16-3c9107a89baf claims 28672 GRANT, real grant 0 Jul 08 08:55:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 08 08:55:27 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 08 09:38:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 374fd2d9-2972-20b7-dfa4-bf6b2470cf36 (at 10.8.1.6@o2ib6) reconnecting Jul 08 09:38:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 263eaecf-e81f-64c0-76c4-67b409a3186f (at 10.8.1.6@o2ib6) Jul 08 09:38:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 09:41:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 08 09:41:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 09:41:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 09:41:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 09:42:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:42:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 09:42:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 09:42:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 09:42:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 09:42:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 09:43:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 09:43:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 09:44:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 09:44:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 09:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 09:44:24 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 08 09:44:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 09:44:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:45:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:45:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 09:45:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 08 09:46:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 09:46:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:46:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 09:46:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 09:46:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 09:46:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 09:46:52 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 08 09:47:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.33@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:47:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.35@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:48:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 09:48:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 09:48:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 09:48:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 08 09:49:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.19@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:50:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 09:50:50 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 08 09:50:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.67@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:50:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 08 09:51:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 09:51:34 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 08 09:53:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.20.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:53:02 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 09:53:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 09:53:18 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 08 09:55:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 09:55:17 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 08 09:57:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.20.18@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 09:57:30 fir-md1-s1 kernel: LustreError: Skipped 16 previous similar messages Jul 08 10:00:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:00:09 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 08 10:03:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 10:03:35 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 10:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 10:04:03 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 08 10:10:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:10:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 08 10:10:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 10:10:51 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 08 10:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 10:14:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 08 10:15:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 10:15:07 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 10:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:20:34 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 10:24:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 10:24:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 10:25:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 10:25:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 08 10:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 10:25:18 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 08 10:30:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:30:38 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 08 10:35:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 10:35:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 10:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 10:35:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 08 10:35:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 10:35:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 10:40:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:40:45 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 08 10:45:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 10:45:31 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 08 10:45:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 10:45:52 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 08 10:51:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 10:51:01 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 08 10:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 10:56:08 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 08 10:57:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.11@o2ib6, removing former export from same NID Jul 08 10:57:12 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 08 10:59:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 10:59:42 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 11:01:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 11:01:04 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 08 11:01:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 11:01:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 11:04:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a2700990-6487-6425-0ded-6ef948a9753e (at 10.8.30.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f8b3e000, cur 1562609096 expire 1562608946 last 1562608869 Jul 08 11:04:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 11:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 11:06:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 08 11:06:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 11:06:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 11:07:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:07:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 08 11:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 11:11:15 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 08 11:14:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 11:14:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 11:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 11:16:49 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 08 11:17:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:17:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 11:21:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 11:21:20 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 08 11:27:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 11:27:19 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 08 11:27:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:27:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 08 11:31:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 11:31:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 11:31:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 11:31:24 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 08 11:37:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:37:26 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 08 11:37:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 11:37:39 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 08 11:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 11:42:37 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 08 11:47:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 11:47:41 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 08 11:48:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:48:01 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 11:48:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 11:48:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 11:52:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 11:52:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 11:58:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 11:58:52 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 08 11:58:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 11:58:54 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 08 12:03:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 12:03:05 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 08 12:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 12:08:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 12:10:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 12:10:31 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 08 12:11:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:11:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 12:13:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 12:13:15 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 08 12:19:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 12:19:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 08 12:20:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:20:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 12:20:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 12:20:35 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 08 12:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 12:23:31 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 08 12:29:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 12:29:18 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 12:30:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 12:30:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 08 12:33:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:33:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 12:33:42 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 08 12:36:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2534fc0400, cur 1562614606 expire 1562614456 last 1562614379 Jul 08 12:36:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 08 12:37:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:37:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 12:38:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:39:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 12:39:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 12:40:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:40:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 12:40:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 12:40:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 12:44:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:44:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 12:44:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 12:44:27 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 08 12:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148ce91c00, cur 1562615302 expire 1562615152 last 1562615075 Jul 08 12:50:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 12:50:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 12:50:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 12:50:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 12:53:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fd98ce000, cur 1562615597 expire 1562615447 last 1562615370 Jul 08 12:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 12:54:37 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 08 12:56:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 12:56:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 13:00:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 13:00:29 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 13:01:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 13:01:09 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 08 13:04:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 13:04:39 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 08 13:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 13:11:12 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 08 13:11:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 13:11:18 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 08 13:13:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 13:13:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 13:14:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 13:14:45 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 08 13:21:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 13:21:21 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 08 13:21:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 13:21:25 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 13:24:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 13:24:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 08 13:24:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 13:24:55 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 13:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e488b13f-c3ed-66de-0053-32b5151ace52 (at 10.8.15.6@o2ib6) in 192 seconds. I think it's dead, and I am evicting it. exp ffff8f229dea1000, cur 1562617549 expire 1562617399 last 1562617357 Jul 08 13:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e488b13f-c3ed-66de-0053-32b5151ace52 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f229dea2c00, cur 1562617584 expire 1562617434 last 1562617357 Jul 08 13:31:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 13:31:33 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 13:33:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 08 13:33:26 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 08 13:34:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 13:34:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 13:35:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 13:35:00 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 08 13:41:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 13:41:36 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 13:45:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 13:45:03 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 08 13:45:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 13:45:03 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 08 13:50:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 13:50:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 13:51:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 13:51:55 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 08 13:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 13:55:28 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 08 13:55:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 13:55:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 08 14:01:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 14:01:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 14:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 14:02:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 14:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 14:05:32 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 08 14:06:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 14:06:00 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 14:12:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 14:12:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 14:14:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 14:14:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 14:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 14:15:49 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 08 14:18:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 14:18:06 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 08 14:22:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 14:22:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 08 14:25:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 14:25:46 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 14:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 14:26:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 14:29:19 fir-md1-s1 kernel: Lustre: 23730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562621348/real 1562621348] req@ffff8f37212cd400 x1636727036156160/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562621359 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 14:29:19 fir-md1-s1 kernel: Lustre: 23730:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 08 14:29:23 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3bcaa46300 x1633654418410912/t0(0) o36->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:28/0 lens 496/448 e 1 to 0 dl 1562621368 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 14:29:23 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 574 previous similar messages Jul 08 14:29:24 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f428476fb00 x1631569431335360/t0(0) o101->20b94f29-3d6d-5fdd-bf3c-536686b5a4fe@10.9.107.47@o2ib4:29/0 lens 576/0 e 1 to 0 dl 1562621369 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 08 14:29:24 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 252 previous similar messages Jul 08 14:29:25 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f201b773600 x1631609088794304/t0(0) o101->c816839b-680c-9a56-ca6b-6b0e082ba795@10.9.106.34@o2ib4:0/0 lens 576/0 e 1 to 0 dl 1562621370 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 08 14:29:25 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 107 previous similar messages Jul 08 14:29:27 fir-md1-s1 kernel: Lustre: 23570:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b9815d700 x1634169439253792/t0(0) o101->97b378ef-cbc6-b9bf-0007-7fdb21d6a3a7@10.9.109.23@o2ib4:2/0 lens 576/0 e 1 to 0 dl 1562621372 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 08 14:29:27 fir-md1-s1 kernel: Lustre: 23570:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 156 previous similar messages Jul 08 14:29:30 fir-md1-s1 kernel: Lustre: 23727:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f3bcaa40600 x1634079367746016/t0(0) o101->49aa8323-a38d-3237-508c-ea94c68aa863@10.9.108.53@o2ib4:28/0 lens 576/0 e 1 to 0 dl 1562621368 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 08 14:29:30 fir-md1-s1 kernel: LustreError: 21128:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.101.27@o2ib4: deadline 20:1s ago req@ffff8f43fdbea700 x1631659625120224/t0(0) o101->b7aae4ae-1aa0-9e5d-5ecf-90e4dbcd33de@10.9.101.27@o2ib4:29/0 lens 576/0 e 1 to 0 dl 1562621369 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 08 14:29:30 fir-md1-s1 kernel: LustreError: 21128:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 08 14:29:30 fir-md1-s1 kernel: Lustre: 23727:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 267 previous similar messages Jul 08 14:30:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 14:30:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 08 14:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 14:33:36 fir-md1-s1 kernel: Lustre: Skipped 727 previous similar messages Jul 08 14:36:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 14:36:18 fir-md1-s1 kernel: Lustre: Skipped 759 previous similar messages Jul 08 14:40:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 14:40:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 14:41:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 14:41:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 14:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 14:43:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 14:46:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 14:46:32 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 08 14:51:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 14:51:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 14:53:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 14:53:43 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 14:54:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 14:54:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 08 14:57:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 14:57:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 08 15:02:18 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3385afe000, cur 1562623338 expire 1562623188 last 1562623111 Jul 08 15:02:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 08 15:03:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:03:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 15:04:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 15:04:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 15:04:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:04:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 15:08:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 15:08:00 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 08 15:15:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:15:17 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 15:16:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 15:16:01 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 15:18:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 15:18:22 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 08 15:19:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:19:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:19:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 08 15:26:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 15:26:27 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 15:26:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:26:30 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 08 15:28:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 15:28:26 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 08 15:29:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:29:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 15:32:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25211cbc00, cur 1562625132 expire 1562624982 last 1562624905 Jul 08 15:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 15:37:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 08 15:37:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:37:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 15:38:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 15:38:27 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 08 15:47:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 15:47:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 15:47:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:47:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 08 15:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 15:48:34 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 08 15:51:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:52:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 15:52:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 15:58:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 15:58:02 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 15:58:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 15:58:45 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 08 15:59:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 15:59:46 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 08 16:06:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:06:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 16:08:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 16:08:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 16:08:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 16:08:55 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 08 16:10:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 16:10:16 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 08 16:18:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 16:18:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 16:19:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 16:19:06 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 08 16:21:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 16:21:06 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 16:28:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 16:28:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 08 16:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 16:29:17 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 08 16:32:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 16:32:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 08 16:36:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8447a07a-e92a-94fe-737c-da4e88830639 (at 10.9.107.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a5e53a800, cur 1562628969 expire 1562628819 last 1562628742 Jul 08 16:37:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:38:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:39:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 16:39:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 16:40:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 16:40:23 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 08 16:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4dda764c-5ca7-3340-a1d3-17b756c64805 (at 10.8.0.67@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1644fc6800, cur 1562629259 expire 1562629109 last 1562629032 Jul 08 16:40:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 16:42:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 16:42:45 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 16:49:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 16:49:43 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 08 16:50:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 16:50:36 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 08 16:53:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 16:53:34 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 08 16:54:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:55:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 16:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:00:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 17:00:03 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 08 17:01:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 17:01:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 08 17:04:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:04:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 17:10:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 17:10:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 08 17:11:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 17:11:13 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 08 17:14:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:14:05 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 08 17:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 17:20:38 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 08 17:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 17:21:22 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 08 17:24:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:24:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 17:28:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 17:31:04 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 08 17:31:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 17:31:22 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 08 17:34:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:34:36 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 08 17:35:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:36:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:36:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:37:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:40:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:41:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 17:41:14 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 08 17:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 17:41:30 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 08 17:43:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:44:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:44:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:44:49 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 17:46:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:46:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 17:49:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:49:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 08 17:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 17:51:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 17:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 17:51:48 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 08 17:54:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 17:54:26 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 08 17:55:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 17:55:53 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 18:02:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 18:02:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 18:02:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 18:02:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 08 18:04:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:04:43 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 08 18:08:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 18:08:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 18:12:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 18:12:12 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 08 18:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 18:12:58 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 18:15:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:15:25 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 18:18:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 18:18:22 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 08 18:22:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 18:22:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 18:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 18:23:17 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 08 18:26:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:26:03 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 18:30:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 18:30:13 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 08 18:32:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 18:32:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 18:33:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 18:33:38 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 08 18:36:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:36:55 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 08 18:41:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 18:41:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 18:42:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 18:42:28 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 18:43:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 18:43:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 08 18:46:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:46:58 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 08 18:51:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 08 18:51:53 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 08 18:52:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 18:52:34 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 08 18:53:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 18:53:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 18:57:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 18:57:11 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 08 19:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 19:02:34 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 08 19:02:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 19:02:44 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 08 19:04:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 19:04:42 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 19:06:15 fir-md1-s1 kernel: Lustre: 22280:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1bef6cd700 x1631840018373024/t0(0) o101->533f2d59-21df-dd34-d3a6-f780aca8b580@10.8.25.3@o2ib6:20/0 lens 480/568 e 0 to 0 dl 1562637980 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:06:15 fir-md1-s1 kernel: Lustre: 22280:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 178 previous similar messages Jul 08 19:06:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e007c1f80/0x5d9ee63602816224 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55c8bda65 expref: 43 pid: 10143 timeout: 1753039 lvb_type: 0 Jul 08 19:07:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 19:07:50 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 08 19:10:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2af4e29400, cur 1562638215 expire 1562638065 last 1562637988 Jul 08 19:10:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 19:12:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 19:12:44 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 08 19:14:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 19:14:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 08 19:14:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 19:14:53 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 08 19:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 19:18:14 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 08 19:22:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 19:22:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 19:24:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 19:24:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 08 19:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 19:25:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 19:30:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 19:30:18 fir-md1-s1 kernel: LustreError: Skipped 14 previous similar messages Jul 08 19:33:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 19:33:06 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 08 19:34:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 19:34:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 08 19:35:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 19:35:50 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 19:40:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 19:40:50 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 08 19:43:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 19:43:09 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 08 19:45:52 fir-md1-s1 kernel: Lustre: 23710:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3e497b3c00 x1631318301775872/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:27/0 lens 480/568 e 0 to 0 dl 1562640357 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:45:57 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.30.19@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e0776d580/0x5d9ee636114fc7ba lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.8.30.19@o2ib6 remote: 0xcd8d918f46f6186b expref: 46 pid: 97644 timeout: 1755417 lvb_type: 0 Jul 08 19:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 19:46:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 19:46:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 08 19:46:42 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 08 19:52:17 fir-md1-s1 kernel: Lustre: 22280:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e9e830c00 x1631318302836336/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:22/0 lens 480/568 e 1 to 0 dl 1562640742 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:52:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 19:52:21 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 08 19:52:28 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2885688300 x1631827979371248/t0(0) o101->d3f5a92e-e73a-b021-4354-c2176911d60c@10.8.30.19@o2ib6:3/0 lens 480/568 e 0 to 0 dl 1562640753 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:52:56 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f30d5640f00 x1633783498073088/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:1/0 lens 480/568 e 0 to 0 dl 1562640781 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:52:56 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 19:53:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2753adec00/0x5d9ee63613f09f8d lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55c98ca61 expref: 50 pid: 21679 timeout: 1755840 lvb_type: 0 Jul 08 19:53:10 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2fcbd5bc00 x1631827979378736/t0(0) o101->d3f5a92e-e73a-b021-4354-c2176911d60c@10.8.30.19@o2ib6:15/0 lens 480/568 e 0 to 0 dl 1562640795 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 19:53:10 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 19:53:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b4e8a7c4-09eb-baae-5220-9b1baa9441aa (at 10.8.30.19@o2ib6) Jul 08 19:53:16 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 19:53:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.24.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f324c2f8d80/0x5d9ee63613f18bdb lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.24.4@o2ib6 remote: 0x8a5ac3af8bf42be6 expref: 40 pid: 23704 timeout: 1755870 lvb_type: 0 Jul 08 19:54:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2cc9bc9d40/0x5d9ee63613f245a1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd263afcb expref: 847 pid: 23748 timeout: 1755900 lvb_type: 0 Jul 08 19:54:01 fir-md1-s1 kernel: LustreError: 97644:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1fceb30000 x1636727138629184/t0(0) o104->fir-MDT0002@10.8.25.23@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 08 19:54:01 fir-md1-s1 kernel: LustreError: 97654:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f451bb7cc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f1c9ba2f740/0x5d9ee63614703e9a lrc: 3/0,0 mode: PW/PW res: [0x2c002c126:0x3e:0x0].0x0 bits 0x40/0x0 rrc: 14 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd26dd78e expref: 384 pid: 97654 timeout: 0 lvb_type: 0 Jul 08 19:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 19:56:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 19:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 19:57:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 08 20:03:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 20:03:38 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 08 20:04:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:04:30 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 20:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 20:06:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 08 20:07:05 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562641618/real 1562641618] req@ffff8f28367f0f00 x1636727143978848/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562641625 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 20:07:05 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 08 20:07:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 20:07:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 08 20:10:05 fir-md1-s1 kernel: Lustre: 23733:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562641798/real 1562641798] req@ffff8f3468ae5400 x1636727145014864/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562641805 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 20:13:40 fir-md1-s1 kernel: Lustre: 21420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1fe3246c00 x1634176436499536/t0(0) o101->bff671a6-6393-a53b-8c2a-0f521cd0a513@10.9.109.13@o2ib4:15/0 lens 1768/3288 e 1 to 0 dl 1562642025 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:13:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 20:13:56 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 08 20:14:11 fir-md1-s1 kernel: Lustre: 20511:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22d137c800 x1631318304603760/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:16/0 lens 480/568 e 1 to 0 dl 1562642056 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:14:20 fir-md1-s1 kernel: Lustre: 24581:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f24d5a89800 x1633783517478704/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:25/0 lens 480/568 e 0 to 0 dl 1562642065 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:14:20 fir-md1-s1 kernel: Lustre: 24581:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 20:14:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f23d98cf500/0x5d9ee6361c267447 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55c9bcc61 expref: 19 pid: 50446 timeout: 1757125 lvb_type: 0 Jul 08 20:14:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:14:43 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 08 20:14:55 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f230e1dee40/0x5d9ee6361c26f68b lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd2bfc1c7 expref: 217 pid: 26256 timeout: 1757155 lvb_type: 0 Jul 08 20:14:55 fir-md1-s1 kernel: LustreError: 21679:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f30cf2bd400 ns: mdt-fir-MDT0002_UUID lock: ffff8f2873f63600/0x5d9ee6361c8a35f2 lrc: 3/0,0 mode: PW/PW res: [0x2c002c406:0x3:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd2c58c77 expref: 135 pid: 21679 timeout: 0 lvb_type: 0 Jul 08 20:14:55 fir-md1-s1 kernel: LustreError: 31015:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.25.23@o2ib6 arrived at 1562642095 with bad export cookie 6746082411793135065 Jul 08 20:14:55 fir-md1-s1 kernel: LustreError: 21679:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 08 20:15:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.24.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f204f088240/0x5d9ee6361c27850f lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.24.4@o2ib6 remote: 0x8a5ac3af8bfbb244 expref: 19 pid: 22287 timeout: 1757185 lvb_type: 0 Jul 08 20:16:24 fir-md1-s1 kernel: Lustre: 24581:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e0841b000 x1631318304621008/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1562642189 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:16:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ddef0525-fd05-baf0-eec8-55af7a82431b (at 10.8.24.4@o2ib6) reconnecting Jul 08 20:16:51 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 20:17:34 fir-md1-s1 kernel: LustreError: 23739:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562642164, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f330a250240/0x5d9ee6361d05c501 lrc: 3/0,1 mode: --/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23739 timeout: 0 lvb_type: 0 Jul 08 20:17:34 fir-md1-s1 kernel: LustreError: 23739:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 08 20:17:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.19@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f214755a1c0/0x5d9ee6361d028da8 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.30.19@o2ib6 remote: 0xcd8d918f4702d808 expref: 19 pid: 24583 timeout: 1757315 lvb_type: 0 Jul 08 20:17:39 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562642169, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f221fa1fbc0/0x5d9ee6361d0cf34c lrc: 3/0,1 mode: --/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 14 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97672 timeout: 0 lvb_type: 0 Jul 08 20:17:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 20:17:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 20:19:20 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f28862d6c00 x1631827981350560/t0(0) o101->d3f5a92e-e73a-b021-4354-c2176911d60c@10.8.30.19@o2ib6:25/0 lens 480/568 e 0 to 0 dl 1562642365 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:19:20 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 20:19:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e50a86c00/0x5d9ee6361e1c8f72 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55c9bddf6 expref: 19 pid: 97672 timeout: 1757424 lvb_type: 0 Jul 08 20:19:54 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.30.19@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0527a072c0/0x5d9ee6361e1d158a lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.8.30.19@o2ib6 remote: 0xcd8d918f47038bae expref: 19 pid: 23692 timeout: 1757454 lvb_type: 0 Jul 08 20:23:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 20:23:57 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 08 20:24:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:24:59 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 08 20:26:13 fir-md1-s1 kernel: Lustre: 23733:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562642766/real 1562642766] req@ffff8f2fc5d36c00 x1636727151419936/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562642773 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 20:26:40 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2129fc2d00 x1631827982161376/t0(0) o101->d3f5a92e-e73a-b021-4354-c2176911d60c@10.8.30.19@o2ib6:15/0 lens 480/568 e 0 to 0 dl 1562642805 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:26:40 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 20:26:44 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1a1686b3c0/0x5d9ee636209c8467 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd307bed6 expref: 194 pid: 23733 timeout: 1757864 lvb_type: 0 Jul 08 20:26:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 20:26:55 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 20:28:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3489786800, cur 1562642899 expire 1562642749 last 1562642672 Jul 08 20:31:34 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f34ccddad00 x1631318306240656/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:9/0 lens 480/568 e 0 to 0 dl 1562643099 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 20:31:34 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 20:31:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.19@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f348c740b40/0x5d9ee636224a4ea6 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.30.19@o2ib6 remote: 0xcd8d918f47099e48 expref: 20 pid: 21380 timeout: 1758158 lvb_type: 0 Jul 08 20:32:24 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562643137/real 1562643137] req@ffff8f2af9ef3600 x1636727153982944/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562643144 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 20:33:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 20:33:14 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 20:35:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 20:35:03 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 08 20:35:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:35:08 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 20:37:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 20:37:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 08 20:43:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 20:43:32 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 08 20:45:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 20:45:18 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 08 20:45:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:45:19 fir-md1-s1 kernel: LustreError: Skipped 20 previous similar messages Jul 08 20:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 20:47:30 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 20:52:53 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562644366/real 1562644366] req@ffff8f0d3e72a100 x1636727162900240/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562644373 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 20:53:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 20:53:43 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 08 20:55:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 20:55:29 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 20:56:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 20:56:02 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 08 20:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 20:57:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 08 21:00:12 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f279a4f1500 x1631596157296432/t0(0) o101->169021a4-a808-827d-1880-f3d0a2ab5ac3@10.9.103.20@o2ib4:17/0 lens 480/568 e 1 to 0 dl 1562644817 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:00:12 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 21:00:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e94a74140/0x5d9ee6362cc520ea lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55c9e43c6 expref: 19 pid: 97654 timeout: 1759886 lvb_type: 0 Jul 08 21:00:44 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 08 21:00:45 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 08 21:00:45 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 261 previous similar messages Jul 08 21:00:46 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 08 21:00:46 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 243 previous similar messages Jul 08 21:00:49 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 08 21:00:49 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 469 previous similar messages Jul 08 21:00:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.103.20@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2fb158d7c0/0x5d9ee6362cc553fe lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x75:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.9.103.20@o2ib4 remote: 0x8e6a8fb7733dfaab expref: 277 pid: 23608 timeout: 1759916 lvb_type: 0 Jul 08 21:03:26 fir-md1-s1 kernel: Lustre: 10198:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a0d910f00 x1631596157941024/t0(0) o101->169021a4-a808-827d-1880-f3d0a2ab5ac3@10.9.103.20@o2ib4:1/0 lens 480/568 e 1 to 0 dl 1562645011 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:03:26 fir-md1-s1 kernel: Lustre: 10198:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 21:03:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:03:47 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 08 21:04:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.20@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f06fc992880/0x5d9ee6362de4771a lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.20@o2ib4 remote: 0x8e6a8fb773412195 expref: 34 pid: 23618 timeout: 1760107 lvb_type: 0 Jul 08 21:04:08 fir-md1-s1 kernel: LustreError: 23103:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.20@o2ib4 arrived at 1562645048 with bad export cookie 6746082412206550431 Jul 08 21:05:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 21:05:46 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 08 21:06:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 21:06:55 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 08 21:08:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 21:08:00 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 21:10:16 fir-md1-s1 kernel: Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562645409/real 1562645409] req@ffff8f36e667c800 x1636727169387760/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562645416 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:12:35 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562645548/real 1562645548] req@ffff8f2215a59e00 x1636727170095040/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562645555 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:13:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:13:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 21:16:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 21:16:12 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 08 21:17:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e516bf000, cur 1562645857 expire 1562645707 last 1562645630 Jul 08 21:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 21:18:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 08 21:18:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 21:18:26 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 08 21:19:23 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f250c4e2100 x1631596161683808/t0(0) o101->169021a4-a808-827d-1880-f3d0a2ab5ac3@10.9.103.20@o2ib4:28/0 lens 480/568 e 0 to 0 dl 1562645968 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:19:23 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 21:19:27 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f34a3a45e80/0x5d9ee63633f88b91 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca01133 expref: 19 pid: 23664 timeout: 1761027 lvb_type: 0 Jul 08 21:19:57 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f19f0a4bcc0/0x5d9ee63633f9e195 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd4173abf expref: 169 pid: 22288 timeout: 1761057 lvb_type: 0 Jul 08 21:23:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:23:56 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 08 21:26:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 21:26:13 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 08 21:28:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 21:28:44 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 21:28:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 21:28:58 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 08 21:32:17 fir-md1-s1 kernel: Lustre: 21368:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562646729/real 1562646729] req@ffff8f08e6547500 x1636727177556416/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562646736 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:32:23 fir-md1-s1 kernel: Lustre: 10143:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562646736/real 1562646736] req@ffff8f348c681200 x1636727177577648/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562646743 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:34:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:34:41 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 08 21:35:41 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562646934/real 1562646934] req@ffff8f344d2ca400 x1636727178440112/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562646941 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 08 21:36:17 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 08 21:38:06 fir-md1-s1 kernel: Lustre: 21413:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f349849fb00 x1633783605451072/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:11/0 lens 480/568 e 0 to 0 dl 1562647091 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:38:06 fir-md1-s1 kernel: Lustre: 21413:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 21:38:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e6ff50900/0x5d9ee6363b5914a1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca325fc expref: 57 pid: 23710 timeout: 1762150 lvb_type: 0 Jul 08 21:38:44 fir-md1-s1 kernel: Lustre: 97652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d8f7bfb00 x1633783606210032/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1562647129 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:38:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 21:38:47 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 21:39:39 fir-md1-s1 kernel: Lustre: 23601:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26008ddd00 x1633783606826736/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:14/0 lens 480/568 e 0 to 0 dl 1562647184 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:39:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1caecc2640/0x5d9ee6363c0b947e lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca33426 expref: 19 pid: 24585 timeout: 1762243 lvb_type: 0 Jul 08 21:40:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 21:40:11 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 08 21:42:36 fir-md1-s1 kernel: Lustre: 21460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1cefb3ef00 x1633783610784720/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:11/0 lens 480/568 e 0 to 0 dl 1562647361 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:45:46 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1ce67e0900 x1633783615047440/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:21/0 lens 480/568 e 0 to 0 dl 1562647551 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:45:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f34622d72c0/0x5d9ee6363e7a0f5b lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca33f1d expref: 19 pid: 23748 timeout: 1762610 lvb_type: 0 Jul 08 21:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 21:46:40 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 08 21:47:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:47:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 08 21:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 21:48:53 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 21:50:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 21:50:30 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 08 21:53:38 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f09b6b27500 x1631538537486240/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:13/0 lens 480/568 e 0 to 0 dl 1562648023 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 21:53:42 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f16b6d8a640/0x5d9ee636419d63c1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca4b358 expref: 19 pid: 21413 timeout: 1763082 lvb_type: 0 Jul 08 21:54:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.103.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f13a9344140/0x5d9ee636419da4e3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.9.103.34@o2ib4 remote: 0x479fd480650c933e expref: 151 pid: 23612 timeout: 1763112 lvb_type: 0 Jul 08 21:54:13 fir-md1-s1 kernel: LustreError: 23554:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1488901000 ns: mdt-fir-MDT0002_UUID lock: ffff8f10de4933c0/0x5d9ee6364200bbf5 lrc: 3/0,0 mode: PW/PW res: [0x2c002c148:0x7d:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x50200000000000 nid: 10.9.103.34@o2ib4 remote: 0x479fd480650d9db5 expref: 27 pid: 23554 timeout: 0 lvb_type: 0 Jul 08 21:54:13 fir-md1-s1 kernel: LustreError: 23554:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Jul 08 21:56:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 08 21:56:46 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 08 21:57:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 21:57:46 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 08 21:58:24 fir-md1-s1 kernel: Lustre: 20511:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562648297/real 1562648297] req@ffff8f2215a5aa00 x1636727187978544/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562648304 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 21:59:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 21:59:03 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 22:01:49 fir-md1-s1 kernel: Lustre: 10143:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562648502/real 1562648502] req@ffff8f34a92ca700 x1636727189714544/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562648509 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:02:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 22:02:28 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 08 22:03:11 fir-md1-s1 kernel: Lustre: 27320:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562648584/real 1562648584] req@ffff8f0b5a8c8000 x1636727190085616/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562648591 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:04:23 fir-md1-s1 kernel: Lustre: 23711:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562648656/real 1562648656] req@ffff8f3e28a00300 x1636727190501888/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562648663 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:06:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 08 22:06:48 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 08 22:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 08 22:08:06 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 08 22:09:19 fir-md1-s1 kernel: Lustre: 23618:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0eb9dcce00 x1631538539899360/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:24/0 lens 480/568 e 1 to 0 dl 1562648964 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:09:19 fir-md1-s1 kernel: Lustre: 23618:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 22:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8 (at 10.9.103.34@o2ib4) reconnecting Jul 08 22:09:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 08 22:09:29 fir-md1-s1 kernel: Lustre: 23645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2fce759500 x1633783633323312/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:4/0 lens 480/568 e 0 to 0 dl 1562648974 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:09:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1eceb01f80/0x5d9ee6364793a473 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca7c0a3 expref: 19 pid: 24585 timeout: 1764033 lvb_type: 0 Jul 08 22:12:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 22:12:43 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 08 22:16:05 fir-md1-s1 kernel: Lustre: 23618:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0fd4606900 x1631538540468448/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:10/0 lens 480/568 e 0 to 0 dl 1562649370 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:16:09 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3321d0cec0/0x5d9ee6364a32b6dc lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd4e438a7 expref: 391 pid: 23608 timeout: 1764429 lvb_type: 0 Jul 08 22:16:09 fir-md1-s1 kernel: LustreError: 25079:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.25.23@o2ib6 arrived at 1562649369 with bad export cookie 6746082412328613743 Jul 08 22:16:10 fir-md1-s1 kernel: LustreError: 23723:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f237ea3e800 ns: mdt-fir-MDT0002_UUID lock: ffff8f287e35b840/0x5d9ee6364a6868c1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c299:0x76:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd4e7f761 expref: 10 pid: 23723 timeout: 0 lvb_type: 0 Jul 08 22:17:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 22:17:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 08 22:18:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 08 22:18:53 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 08 22:19:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 22:19:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 22:24:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 22:24:03 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 08 22:24:24 fir-md1-s1 kernel: Lustre: 23579:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f36d1a3b300 x1631538542400400/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:29/0 lens 480/568 e 0 to 0 dl 1562649869 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:24:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f23088eba80/0x5d9ee6364d70cfbd lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55ca9f92b expref: 19 pid: 97654 timeout: 1764928 lvb_type: 0 Jul 08 22:25:16 fir-md1-s1 kernel: Lustre: 23748:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26a6721e00 x1631776452617600/t0(0) o101->cb1e051f-12ef-c393-c1de-bc60ba01debc@10.8.13.11@o2ib6:21/0 lens 480/568 e 0 to 0 dl 1562649921 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:25:16 fir-md1-s1 kernel: Lustre: 23748:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 22:25:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f4f487bc0/0x5d9ee6364db317bd lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd51b49dc expref: 106 pid: 20730 timeout: 1764980 lvb_type: 0 Jul 08 22:25:20 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.25.23@o2ib6 arrived at 1562649920 with bad export cookie 6746082412698566123 Jul 08 22:25:20 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1a2f6c6000 ns: mdt-fir-MDT0002_UUID lock: ffff8f263bf23a80/0x5d9ee6364de32c02 lrc: 3/0,0 mode: PW/PW res: [0x2c002c408:0x4:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd51ee12f expref: 24 pid: 21003 timeout: 0 lvb_type: 0 Jul 08 22:25:20 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 08 22:25:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.103.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f122353f980/0x5d9ee6364db42cbb lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.9.103.34@o2ib4 remote: 0x479fd480651f07ce expref: 43 pid: 22280 timeout: 1765010 lvb_type: 0 Jul 08 22:25:51 fir-md1-s1 kernel: LustreError: 23632:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2fd1ddc400 ns: mdt-fir-MDT0002_UUID lock: ffff8f07d5ff2d00/0x5d9ee6364e158666 lrc: 3/0,0 mode: PW/PW res: [0x2c002c148:0x7e:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x50200000000000 nid: 10.9.103.34@o2ib4 remote: 0x479fd480651fd36f expref: 8 pid: 23632 timeout: 0 lvb_type: 0 Jul 08 22:25:51 fir-md1-s1 kernel: LustreError: 23632:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 08 22:27:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 22:27:33 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 08 22:28:43 fir-md1-s1 kernel: Lustre: 23686:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f35e0a4e900 x1631538543215680/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:18/0 lens 480/568 e 1 to 0 dl 1562650128 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:28:43 fir-md1-s1 kernel: Lustre: 23686:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 22:30:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 22:30:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 08 22:30:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 22:30:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 08 22:30:39 fir-md1-s1 kernel: Lustre: 97665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562650232/real 1562650232] req@ffff8f1f6a956c00 x1636727203823808/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562650239 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:32:42 fir-md1-s1 kernel: Lustre: 23571:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0e3be98300 x1631538543770800/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:17/0 lens 480/568 e 0 to 0 dl 1562650367 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:32:42 fir-md1-s1 kernel: Lustre: 23571:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 22:32:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2122373a80/0x5d9ee636507a323b lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd541da60 expref: 146 pid: 50446 timeout: 1765426 lvb_type: 0 Jul 08 22:32:47 fir-md1-s1 kernel: LustreError: 23678:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1e5214b400 ns: mdt-fir-MDT0002_UUID lock: ffff8f2604608fc0/0x5d9ee63650af3e73 lrc: 3/0,0 mode: PW/PW res: [0x2c002c11e:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd545183b expref: 63 pid: 23678 timeout: 0 lvb_type: 0 Jul 08 22:32:47 fir-md1-s1 kernel: LustreError: 23678:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 08 22:34:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 22:34:04 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 08 22:35:40 fir-md1-s1 kernel: Lustre: 27320:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562650533/real 1562650533] req@ffff8f4328321b00 x1636727205751232/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562650540 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:37:21 fir-md1-s1 kernel: Lustre: 23645:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562650634/real 1562650634] req@ffff8f34415e6c00 x1636727206274400/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562650641 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:38:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 22:38:14 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 08 22:40:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 22:40:07 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 08 22:40:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 22:40:42 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 22:41:26 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2898bbcb00 x1633783665580720/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:1/0 lens 480/568 e 1 to 0 dl 1562650891 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:41:26 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 08 22:41:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0c60728d80/0x5d9ee63653edc3aa lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55cab25de expref: 19 pid: 20555 timeout: 1765959 lvb_type: 0 Jul 08 22:42:40 fir-md1-s1 kernel: LustreError: 10146:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1dc731fc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f34ea7e0900/0x5d9ee636548d1880 lrc: 3/0,0 mode: PW/PW res: [0x2c002c183:0xb2:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x50200000000000 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd57d3b7a expref: 33 pid: 10146 timeout: 0 lvb_type: 0 Jul 08 22:45:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 22:45:17 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 08 22:48:12 fir-md1-s1 kernel: Lustre: 23710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562651285/real 1562651285] req@ffff8f341ba85700 x1636727211140336/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562651292 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:48:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 22:48:30 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 08 22:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 22:50:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 08 22:51:00 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1e12182d00 x1631538547465696/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:5/0 lens 480/568 e 0 to 0 dl 1562651465 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 22:51:00 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 22:51:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3332bab600/0x5d9ee63657f89a62 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55cabe0b5 expref: 19 pid: 21679 timeout: 1766524 lvb_type: 0 Jul 08 22:51:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 08 22:54:40 fir-md1-s1 kernel: LustreError: 25087:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.30.23@o2ib6 arrived at 1562651680 with bad export cookie 6746082412932528747 Jul 08 22:55:18 fir-md1-s1 kernel: LustreError: 21268:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.25.23@o2ib6 arrived at 1562651718 with bad export cookie 6746082412868746049 Jul 08 22:55:19 fir-md1-s1 kernel: LustreError: 22288:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f23c7482800 ns: mdt-fir-MDT0002_UUID lock: ffff8f24f8f17bc0/0x5d9ee6365a080b92 lrc: 3/0,0 mode: PW/PW res: [0x2c002c148:0x7d:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x50200400000020 nid: 10.8.25.23@o2ib6 remote: 0xeb7608bcd5c0331b expref: 19 pid: 22288 timeout: 0 lvb_type: 0 Jul 08 22:55:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 22:55:43 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 08 22:57:47 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562651860/real 1562651860] req@ffff8f09946f2d00 x1636727215671536/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562651867 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 22:58:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 22:58:36 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 08 23:00:07 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1bb0b36c00 x1631776455317904/t0(0) o101->cb1e051f-12ef-c393-c1de-bc60ba01debc@10.8.13.11@o2ib6:11/0 lens 480/568 e 0 to 0 dl 1562652011 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 23:00:07 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 08 23:00:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.103.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f204b5a9b00/0x5d9ee6365bd8e59f lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.34@o2ib4 remote: 0x479fd480653d26e8 expref: 35 pid: 20511 timeout: 1767071 lvb_type: 0 Jul 08 23:00:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Jul 08 23:00:11 fir-md1-s1 kernel: LustreError: 20511:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2d327f0400 ns: mdt-fir-MDT0002_UUID lock: ffff8f1d60599680/0x5d9ee6365c10038b lrc: 3/0,0 mode: PW/PW res: [0x2c002c148:0x7e:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x50200000000000 nid: 10.9.103.34@o2ib4 remote: 0x479fd480653d7799 expref: 11 pid: 20511 timeout: 0 lvb_type: 0 Jul 08 23:00:11 fir-md1-s1 kernel: LustreError: 20511:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Jul 08 23:00:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 274acbe5-1f09-1bc7-1d04-06ba56c47198 (at 10.8.25.23@o2ib6) reconnecting Jul 08 23:00:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 23:00:50 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.13.11@o2ib6 arrived at 1562652050 with bad export cookie 6746082412959469584 Jul 08 23:05:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 23:05:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 08 23:06:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:06:08 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 08 23:07:17 fir-md1-s1 kernel: Lustre: 20731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562652430/real 1562652430] req@ffff8f1c4eca8c00 x1636727219171664/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562652437 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 23:08:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 23:08:48 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 08 23:10:03 fir-md1-s1 kernel: Lustre: 20465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562652596/real 1562652596] req@ffff8f23a172fb00 x1636727220318720/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562652603 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 23:10:27 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562652620/real 1562652620] req@ffff8f09946f7800 x1636727220492944/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562652627 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 23:10:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 08 23:10:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 08 23:12:40 fir-md1-s1 kernel: Lustre: 23745:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562652753/real 1562652753] req@ffff8f34415e6600 x1636727221658160/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562652760 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 08 23:12:55 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 08 23:16:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 23:16:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:16:55 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 08 23:18:08 fir-md1-s1 kernel: Lustre: 23714:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34b2cf4500 x1631776456539520/t0(0) o101->cb1e051f-12ef-c393-c1de-bc60ba01debc@10.8.13.11@o2ib6:13/0 lens 480/568 e 1 to 0 dl 1562653093 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 23:18:08 fir-md1-s1 kernel: Lustre: 23714:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 08 23:18:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f34f3317980/0x5d9ee63662fd8af6 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe55cafa518 expref: 20 pid: 50584 timeout: 1768162 lvb_type: 0 Jul 08 23:18:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 08 23:18:46 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.30.23@o2ib6 arrived at 1562653126 with bad export cookie 6746082412956598821 Jul 08 23:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 23:19:07 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 08 23:21:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 08 23:21:19 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 08 23:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f474a000, cur 1562653546 expire 1562653396 last 1562653319 Jul 08 23:27:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:27:13 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 08 23:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 23:29:24 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 08 23:31:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 08 23:31:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 08 23:33:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 23:33:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 08 23:37:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:37:29 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 08 23:39:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 08 23:39:25 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 08 23:41:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 08 23:41:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 08 23:47:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:47:35 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 08 23:49:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 08 23:49:27 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 08 23:50:28 fir-md1-s1 kernel: Lustre: 22429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c050bb850 x1631600503280448/t0(0) o3->657250be-d5db-acec-954e-1239d7463eca@10.9.104.65@o2ib4:2/0 lens 488/8632 e 1 to 0 dl 1562655032 ref 2 fl Interpret:/0/0 rc 0/0 Jul 08 23:50:28 fir-md1-s1 kernel: Lustre: 22429:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 08 23:51:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 08 23:51:20 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 08 23:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 08 23:52:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 08 23:57:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 08 23:57:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 09 00:00:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 00:00:15 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 09 00:02:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 00:02:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 00:03:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 00:03:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 00:07:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 00:07:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 00:11:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 00:11:34 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 09 00:12:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 00:12:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 00:15:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 00:15:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 00:17:46 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c050bd850 x1634176625220320/t0(0) o3->bff671a6-6393-a53b-8c2a-0f521cd0a513@10.9.109.13@o2ib4:21/0 lens 488/16824 e 1 to 0 dl 1562656671 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:17:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 00:17:53 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 09 00:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 00:21:36 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 09 00:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 00:23:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 09 00:28:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 00:28:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 00:28:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 00:28:41 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 00:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 00:31:37 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 09 00:32:11 fir-md1-s1 kernel: Lustre: 21433:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f204858f800 x1631686395367168/t0(0) o101->8a2377b9-dd4d-1468-124f-a22e5b47b9b4@10.8.11.23@o2ib6:15/0 lens 376/1600 e 0 to 0 dl 1562657535 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:33:06 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562657579/real 1562657579] req@ffff8f1eddda4b00 x1636727371386944/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562657586 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 00:33:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26ac517c-0ccc-5f83-6680-5e234583a053 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d3c036400, cur 1562657628 expire 1562657478 last 1562657401 Jul 09 00:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 00:33:57 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 09 00:34:23 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f36eef19200 x1631538592399248/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:28/0 lens 480/568 e 1 to 0 dl 1562657668 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:34:54 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1af46e6000 x1631779714566112/t0(0) o101->a5959e71-bc10-93fe-ec09-fd083077a83e@10.8.24.26@o2ib6:29/0 lens 480/568 e 0 to 0 dl 1562657699 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:35:38 fir-md1-s1 kernel: LustreError: 50581:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562657648, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1dc0abc140/0x5d9ee636844156a6 lrc: 3/0,1 mode: --/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 50581 timeout: 0 lvb_type: 0 Jul 09 00:36:16 fir-md1-s1 kernel: Lustre: 21333:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26612f3300 x1631779714621024/t0(0) o101->a5959e71-bc10-93fe-ec09-fd083077a83e@10.8.24.26@o2ib6:21/0 lens 480/568 e 0 to 0 dl 1562657781 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:36:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f20c6e82400/0x5d9ee636866668f2 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a2c73ad9 expref: 44 pid: 21482 timeout: 1772840 lvb_type: 0 Jul 09 00:36:30 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562657783/real 1562657783] req@ffff8f1261da2400 x1636727472516928/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562657790 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 00:36:30 fir-md1-s1 kernel: Lustre: 23635:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f450e784500 x1631538593904768/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:5/0 lens 480/568 e 0 to 0 dl 1562657795 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:36:37 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562657790/real 1562657790] req@ffff8f1261da2400 x1636727472516928/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562657797 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 09 00:38:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 00:38:58 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 09 00:40:59 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562658051/real 1562658051] req@ffff8f27997d0c00 x1636727498629504/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562658058 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 00:41:10 fir-md1-s1 kernel: Lustre: 23748:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2692228f00 x1631755762631056/t0(0) o101->6102ee9c-599d-0d29-7336-fa30c59b9711@10.8.20.10@o2ib6:15/0 lens 480/568 e 0 to 0 dl 1562658075 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:41:14 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2893fa72c0/0x5d9ee636897d28d3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22645f217 expref: 45 pid: 21333 timeout: 1773134 lvb_type: 0 Jul 09 00:41:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 00:41:42 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 09 00:41:49 fir-md1-s1 kernel: Lustre: 20555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f19f28b6900 x1631538615196608/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:24/0 lens 480/568 e 0 to 0 dl 1562658114 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:43:08 fir-md1-s1 kernel: Lustre: 20511:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f22efd9bf00 x1631779727080784/t0(0) o101->a5959e71-bc10-93fe-ec09-fd083077a83e@10.8.24.26@o2ib6:13/0 lens 480/568 e 0 to 0 dl 1562658193 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:43:08 fir-md1-s1 kernel: Lustre: 20511:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 09 00:44:17 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 09 00:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 00:44:25 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 00:45:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 00:45:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 00:46:07 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f4392b54200 x1631538634146480/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:12/0 lens 480/568 e 0 to 0 dl 1562658372 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:46:07 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 09 00:46:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.24.26@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1a48f31680/0x5d9ee6368cbb73d3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.24.26@o2ib6 remote: 0x532ae402ed14dc60 expref: 19 pid: 23733 timeout: 1773466 lvb_type: 0 Jul 09 00:46:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Jul 09 00:47:10 fir-md1-s1 kernel: LustreError: 48115:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.24.26@o2ib6 arrived at 1562658430 with bad export cookie 6746082413791205776 Jul 09 00:50:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 00:50:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 09 00:51:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 00:51:53 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 09 00:52:14 fir-md1-s1 kernel: Lustre: 23455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1f4ed2f200 x1631686418353328/t0(0) o101->8a2377b9-dd4d-1468-124f-a22e5b47b9b4@10.8.11.23@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1562658739 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 00:52:14 fir-md1-s1 kernel: Lustre: 23455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 09 00:52:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6da12e78-70c3-9109-6c3f-cc3cd573cc58 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28323dd800, cur 1562658777 expire 1562658627 last 1562658550 Jul 09 00:52:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 00:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 00:54:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 00:55:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.103.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f267e75bcc0/0x5d9ee636920d05bc lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.34@o2ib4 remote: 0x479fd480673e7ecd expref: 258 pid: 23608 timeout: 1773967 lvb_type: 0 Jul 09 00:55:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 09 00:57:08 fir-md1-s1 kernel: LustreError: 23103:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.24.26@o2ib6 arrived at 1562659028 with bad export cookie 6746082413824618848 Jul 09 00:57:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 00:57:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 01:00:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 01:00:52 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 09 01:01:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 01:01:53 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 09 01:04:12 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26a28e6300 x1631686435373952/t0(0) o101->8a2377b9-dd4d-1468-124f-a22e5b47b9b4@10.8.11.23@o2ib6:17/0 lens 480/568 e 0 to 0 dl 1562659457 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 01:04:12 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 09 01:04:28 fir-md1-s1 kernel: Lustre: 23644:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562659461/real 1562659461] req@ffff8f10a97f9500 x1636727615415392/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562659468 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 01:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 01:04:49 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 01:05:17 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562659427, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1dab428240/0x5d9ee6369821c9b1 lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23645 timeout: 0 lvb_type: 0 Jul 09 01:05:17 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 09 01:05:31 fir-md1-s1 kernel: Lustre: 23652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562659524/real 1562659524] req@ffff8f2ddae99800 x1636727620559008/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562659531 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 01:06:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1dab428240/0x5d9ee6369821c9b1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.11.23@o2ib6 remote: 0x685a2eace538c518 expref: 19 pid: 23645 timeout: 1774630 lvb_type: 0 Jul 09 01:06:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 09 01:06:23 fir-md1-s1 kernel: Lustre: 23716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562659576/real 1562659576] req@ffff8f26a63edd00 x1636727624842192/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562659583 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 01:09:10 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562659660, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e46f3dc40/0x5d9ee6369a847bdb lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97660 timeout: 0 lvb_type: 0 Jul 09 01:09:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 01:09:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 01:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 01:11:53 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 09 01:12:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 09 01:12:20 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 09 01:14:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 01:14:51 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 09 01:17:26 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2798a27800 x1631779760667632/t0(0) o101->a5959e71-bc10-93fe-ec09-fd083077a83e@10.8.24.26@o2ib6:1/0 lens 480/568 e 0 to 0 dl 1562660251 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 01:17:26 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Jul 09 01:20:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2022bb0000/0x5d9ee636a1f3fbe3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7c:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226478ed1 expref: 20 pid: 97672 timeout: 1775495 lvb_type: 0 Jul 09 01:20:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Jul 09 01:20:40 fir-md1-s1 kernel: LustreError: 25085:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.11.6@o2ib6 arrived at 1562660440 with bad export cookie 6746082414046326733 Jul 09 01:22:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 01:22:17 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 09 01:22:52 fir-md1-s1 kernel: Lustre: 21333:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562660565/real 1562660565] req@ffff8f2fc6dee000 x1636727696927184/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562660572 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 01:22:59 fir-md1-s1 kernel: Lustre: 97655:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562660572/real 1562660572] req@ffff8f1e35fc0c00 x1636727697331696/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562660579 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 09 01:23:06 fir-md1-s1 kernel: Lustre: 97655:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562660579/real 1562660579] req@ffff8f1e35fc0c00 x1636727697331696/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562660586 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 09 01:23:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 01:23:12 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 09 01:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 01:25:19 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 09 01:27:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 01:27:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 01:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0ebe3954-8665-c753-62ab-a40297bf966d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b0a2fec00, cur 1562660848 expire 1562660698 last 1562660621 Jul 09 01:27:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 01:30:16 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2622fc0c00 x1631538751564720/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:21/0 lens 480/568 e 0 to 0 dl 1562661021 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 01:30:16 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Jul 09 01:32:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 01:32:20 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 09 01:32:49 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ae8f6de80/0x5d9ee636a85e7321 lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x4:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.103.34@o2ib4 remote: 0x479fd480690b6545 expref: 26 pid: 23692 timeout: 1776229 lvb_type: 0 Jul 09 01:32:49 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 09 01:33:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.29@o2ib6, removing former export from same NID Jul 09 01:33:45 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 01:35:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 01:35:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 09 01:39:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 01:39:30 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 09 01:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 01:42:24 fir-md1-s1 kernel: Lustre: Skipped 134 previous similar messages Jul 09 01:44:36 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f202ae13600 x1631538758613536/t0(0) o101->d3a33565-cf5d-2ffd-ba04-f0bdcb5e77d8@10.9.103.34@o2ib4:10/0 lens 480/568 e 0 to 0 dl 1562661880 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 01:44:36 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 09 01:44:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f26e0f2ba80/0x5d9ee636ad2aa44c lrc: 3/0,0 mode: PW/PW res: [0x2c002c409:0x3:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a2ce36bf expref: 19 pid: 23747 timeout: 1776939 lvb_type: 0 Jul 09 01:44:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 09 01:45:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.8.2@o2ib6, removing former export from same NID Jul 09 01:45:18 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 09 01:45:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 01:45:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 09 01:49:00 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.22.20@o2ib6 arrived at 1562662140 with bad export cookie 6746082414362191954 Jul 09 01:51:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 01:51:20 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 01:52:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 01:52:35 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 09 01:55:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 01:55:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 01:56:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 01:56:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 01:56:13 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f206aac7200 x1634160005722400/t0(0) o101->32315fe6-6915-bd82-691a-5460d13ab6db@10.9.103.27@o2ib4:18/0 lens 480/568 e 0 to 0 dl 1562662578 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 01:56:13 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 09 01:56:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f17dd6f5340/0x5d9ee636b1dc16ac lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7c:0x0].0x0 bits 0x40/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22648110e expref: 20 pid: 24577 timeout: 1777637 lvb_type: 0 Jul 09 01:56:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 09 01:56:56 fir-md1-s1 kernel: LustreError: 25028:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.11.23@o2ib6 arrived at 1562662616 with bad export cookie 6746082414329937900 Jul 09 02:00:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b30c01a7-931a-8263-f304-966fa9bd47ec (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dbc608000, cur 1562662817 expire 1562662667 last 1562662590 Jul 09 02:00:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:03:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 02:03:01 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 09 02:06:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 02:06:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 09 02:06:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 02:06:06 fir-md1-s1 kernel: Lustre: Skipped 163427 previous similar messages Jul 09 02:07:19 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f34b9f7c200 x1631686488015440/t0(0) o101->8a2377b9-dd4d-1468-124f-a22e5b47b9b4@10.8.11.23@o2ib6:24/0 lens 480/568 e 0 to 0 dl 1562663244 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 02:07:19 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 09 02:07:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.20.10@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1ccba87980/0x5d9ee636b61779f9 lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 26 type: IBT flags: 0x60200400000020 nid: 10.8.20.10@o2ib6 remote: 0x1f0dec44cd243bfd expref: 19 pid: 97660 timeout: 1778304 lvb_type: 0 Jul 09 02:07:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 09 02:08:03 fir-md1-s1 kernel: LustreError: 31007:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.11.23@o2ib6 arrived at 1562663283 with bad export cookie 6746082414442404366 Jul 09 02:09:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4a34d9ca-85d3-d986-1b27-304345ee5afb (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb6df2000, cur 1562663380 expire 1562663230 last 1562663153 Jul 09 02:09:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:09:56 fir-md1-s1 kernel: LustreError: 20369:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.25.23@o2ib6 arrived at 1562663396 with bad export cookie 6746082412960696376 Jul 09 02:10:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 02:10:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 02:10:28 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562663337, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e55e99440/0x5d9ee636b6cab2df lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7c:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20555 timeout: 0 lvb_type: 0 Jul 09 02:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 02:13:03 fir-md1-s1 kernel: Lustre: Skipped 163418 previous similar messages Jul 09 02:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b7caf93d-2daa-26e6-33b8-897c7ea93dd8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f186cfcf800, cur 1562663621 expire 1562663471 last 1562663394 Jul 09 02:13:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b7caf93d-2daa-26e6-33b8-897c7ea93dd8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af16af000, cur 1562663638 expire 1562663488 last 1562663411 Jul 09 02:14:41 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562663591, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1a7ba3e0c0/0x5d9ee636b8308d2f lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24580 timeout: 0 lvb_type: 0 Jul 09 02:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6102ee9c-599d-0d29-7336-fa30c59b9711 (at 10.8.20.10@o2ib6) reconnecting Jul 09 02:16:09 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 02:16:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 02:16:13 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 09 02:16:51 fir-md1-s1 kernel: LustreError: 97638:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562663721, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1638c9d580/0x5d9ee636b8e525a4 lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7c:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97638 timeout: 0 lvb_type: 0 Jul 09 02:16:51 fir-md1-s1 kernel: LustreError: 97638:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Jul 09 02:17:02 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.20.10@o2ib6 arrived at 1562663822 with bad export cookie 6746082414526375925 Jul 09 02:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c4743a800, cur 1562664068 expire 1562663918 last 1562663841 Jul 09 02:21:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 09 02:23:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 02:23:29 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 09 02:25:39 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3478e20c00 x1633783776209328/t0(0) o101->274acbe5-1f09-1bc7-1d04-06ba56c47198@10.8.25.23@o2ib6:14/0 lens 480/568 e 0 to 0 dl 1562664344 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 02:25:39 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 09 02:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 02:26:14 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 02:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8a2377b9-dd4d-1468-124f-a22e5b47b9b4 (at 10.8.11.23@o2ib6) reconnecting Jul 09 02:26:14 fir-md1-s1 kernel: Lustre: Skipped 153227 previous similar messages Jul 09 02:26:45 fir-md1-s1 kernel: LustreError: 23748:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562664314, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f324ab2e540/0x5d9ee636bc272085 lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23748 timeout: 0 lvb_type: 0 Jul 09 02:26:59 fir-md1-s1 kernel: LustreError: 22288:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562664329, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1bba081440/0x5d9ee636bc384037 lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22288 timeout: 0 lvb_type: 0 Jul 09 02:26:59 fir-md1-s1 kernel: LustreError: 22288:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 09 02:27:13 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562664343, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f171500f980/0x5d9ee636bc48eb3c lrc: 3/0,1 mode: --/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97643 timeout: 0 lvb_type: 0 Jul 09 02:27:44 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2b6c640900/0x5d9ee636bc270a51 lrc: 3/0,0 mode: PW/PW res: [0x2c002c180:0x7b:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226485571 expref: 14 pid: 23748 timeout: 1779524 lvb_type: 0 Jul 09 02:27:44 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 9 previous similar messages Jul 09 02:27:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b7eb93d5-8c42-223b-054b-48b7832859bc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16d5042000, cur 1562664478 expire 1562664328 last 1562664251 Jul 09 02:31:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 02:31:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 02:33:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 02:33:41 fir-md1-s1 kernel: Lustre: Skipped 153278 previous similar messages Jul 09 02:35:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 188b3bd1-23a6-543f-00d7-3c05d963cb64 (at 10.8.11.9@o2ib6) in 153 seconds. I think it's dead, and I am evicting it. exp ffff8f439c6b6400, cur 1562664956 expire 1562664806 last 1562664803 Jul 09 02:35:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:36:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 02:36:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 02:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 02:36:17 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 09 02:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 12cdaed2-086d-f211-b5e6-a7a51b57bbf6 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f247b6eec00, cur 1562665030 expire 1562664880 last 1562664803 Jul 09 02:40:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 02:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9df2cc07-ba94-1ea2-6172-f47b09f55c82 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f167f389400, cur 1562665270 expire 1562665120 last 1562665043 Jul 09 02:41:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 09 02:43:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 02:43:53 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 09 02:45:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f691ce56-c75a-3453-35b5-9cac0a6f187c (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e0a998800, cur 1562665512 expire 1562665362 last 1562665285 Jul 09 02:45:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 02:46:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 02:46:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 02:46:20 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 02:50:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 02:53:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 02:53:54 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 09 02:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e2e8b6fe-9a67-1617-a235-c6cc38ba57d4 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f3567800, cur 1562666055 expire 1562665905 last 1562665828 Jul 09 02:54:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 02:57:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 02:57:14 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 02:57:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 02:57:16 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 09 03:03:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:03:55 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 09 03:07:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 03:07:23 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 09 03:09:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 03:09:06 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 09 03:10:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:14:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:14:00 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 03:14:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:14:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d4704d07-4d9d-83e2-a0bd-ed6cd3778ee5 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3629400, cur 1562667294 expire 1562667144 last 1562667067 Jul 09 03:14:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 09 03:17:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:18:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 03:18:05 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 03:19:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 03:19:07 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 09 03:20:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3826547f-d431-1d44-4311-1be321d906e4 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f189d0a0400, cur 1562667612 expire 1562667462 last 1562667385 Jul 09 03:20:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 03:20:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:20:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 03:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:24:13 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 09 03:28:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 03:28:11 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 03:29:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 03:29:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 03:33:40 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 09 03:34:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:34:17 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 03:34:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 03:38:40 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 03:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f279a64b000, cur 1562668842 expire 1562668692 last 1562668615 Jul 09 03:40:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 03:40:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 03:40:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 03:44:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:44:22 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 03:46:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 03:46:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 03:48:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5bf7a607-5118-27c6-615a-5015949857b5 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2383815800, cur 1562669318 expire 1562669168 last 1562669091 Jul 09 03:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 03:49:42 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 03:52:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 03:52:26 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 09 03:54:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 03:54:25 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 09 04:00:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 04:00:07 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 09 04:00:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 04:00:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 04:03:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 04:03:36 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 09 04:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 04:05:00 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 09 04:11:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 04:11:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 04:13:13 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 09 04:13:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 04:13:57 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 04:15:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 04:15:02 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 04:16:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 04:16:34 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 04:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 04:22:29 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 04:25:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 04:25:13 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 04:25:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 04:25:13 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 04:32:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 04:32:37 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 09 04:34:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 04:34:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 04:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 04468791-317f-0b85-a724-e5fbf6594482 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2672b24400, cur 1562672169 expire 1562672019 last 1562671942 Jul 09 04:36:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 04:36:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 04:36:16 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 04:37:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 04:37:03 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 09 04:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 04:44:23 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 09 04:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 04:46:18 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 04:47:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 09 04:47:28 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 04:48:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 04:48:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 04:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c0143168-7b00-5187-33ee-2ee23ada0e35 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2826e09c00, cur 1562672946 expire 1562672796 last 1562672719 Jul 09 04:49:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 04:54:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 04:54:55 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 04:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 04:56:23 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 09 04:57:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 04:57:59 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 09 04:59:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 04:59:17 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 05:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 05:05:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 09 05:06:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 05:06:48 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 05:08:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 05:08:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 05:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 05:15:14 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 05:17:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 05:17:00 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 09 05:19:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 05:19:07 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 09 05:25:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 05:25:17 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 09 05:27:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 05:27:34 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 09 05:28:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 05:29:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 05:29:08 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 05:30:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 05:36:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 05:36:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 09 05:37:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 05:37:36 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 09 05:40:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 05:40:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 05:46:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 05:46:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 05:46:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 05:46:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 05:47:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 05:47:44 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 09 05:50:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 05:50:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 09 05:52:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 05:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 05:56:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 05:57:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 05:57:47 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 09 06:01:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 06:01:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 06:06:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 06:06:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 06:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 06:07:53 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 06:11:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 06:11:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 09 06:15:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 06:17:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 06:17:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:17:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 06:17:57 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 09 06:18:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1103d677-bcdc-c647-1248-807c12ba22a8 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17bcc02c00, cur 1562678290 expire 1562678140 last 1562678063 Jul 09 06:18:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 06:19:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:22:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 06:22:31 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 09 06:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 06:27:00 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 09 06:27:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 06:27:58 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 09 06:32:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 06:32:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 06:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3d028c2f-2477-2a00-2f10-1e73838f7457 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f347ed00800, cur 1562679326 expire 1562679176 last 1562679099 Jul 09 06:35:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 06:36:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 06:37:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 06:37:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:38:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 06:38:03 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 06:38:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:40:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:40:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 94cb6da7-d582-1b33-0e0e-34207c65c599 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fc6deec00, cur 1562679648 expire 1562679498 last 1562679421 Jul 09 06:40:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 06:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f34ec838800, cur 1562679724 expire 1562679574 last 1562679517 Jul 09 06:42:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 06:42:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 06:42:36 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 09 06:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 06:47:17 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 09 06:48:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 06:48:12 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 09 06:48:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:53:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 06:53:57 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 09 06:57:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 06:57:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 06:57:21 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 09 06:58:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 06:58:16 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 09 06:58:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4a775cba-723d-a68b-1ff4-ae110efb02b5 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f171e1e2000, cur 1562681039 expire 1562680889 last 1562680812 Jul 09 07:04:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 07:04:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 07:04:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:04:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 07:07:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 07:07:47 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 07:08:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 07:08:18 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 09 07:09:04 fir-md1-s1 kernel: Lustre: 23631:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f379c388000 x1636571295304848/t0(0) o101->86fa2497-cbd1-3103-4628-e12187b558d9@10.9.101.25@o2ib4:9/0 lens 480/568 e 1 to 0 dl 1562681349 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:09:04 fir-md1-s1 kernel: Lustre: 23631:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 09 07:11:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:14:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 07:14:33 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 07:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 07:18:28 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 07:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 07:18:28 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 07:24:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 07:24:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:24:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 07:24:35 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 09 07:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 07:29:35 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 07:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 07:29:35 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 09 07:35:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 07:35:12 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 09 07:35:12 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply req@ffff8f1d6abe6000 x1635107050306688/t0(0) o101->83887939-6757-4aea-8b88-f0aa38eb91bc@10.9.108.13@o2ib4:17/0 lens 576/3264 e 0 to 0 dl 1562682917 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:35:12 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Jul 09 07:35:22 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply req@ffff8f1dee52f500 x1635089897588224/t0(0) o101->2c084bd6-6132-6737-34f2-02b28f3edaf8@10.9.109.32@o2ib4:27/0 lens 576/0 e 0 to 0 dl 1562682927 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 09 07:35:22 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1059 previous similar messages Jul 09 07:35:25 fir-md1-s1 kernel: Lustre: 23585:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (27:1s); client may timeout. req@ffff8f07da244200 x1638234212823008/t0(0) o101->820a82e4-064a-d399-a663-1803c58bca77@10.9.112.15@o2ib4:24/0 lens 576/592 e 0 to 0 dl 1562682924 ref 1 fl Complete:/0/0 rc 0/0 Jul 09 07:35:25 fir-md1-s1 kernel: LustreError: 21410:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.20@o2ib4: deadline 27:1s ago req@ffff8f1338bbd100 x1631568602423920/t0(0) o101->0db2d4e0-bf1e-3689-817d-00b10dcb4858@10.9.102.20@o2ib4:24/0 lens 576/0 e 0 to 0 dl 1562682924 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 09 07:35:25 fir-md1-s1 kernel: LustreError: 21410:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Jul 09 07:35:25 fir-md1-s1 kernel: Lustre: 23585:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 317 previous similar messages Jul 09 07:38:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:38:30 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 07:39:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 07:39:36 fir-md1-s1 kernel: Lustre: Skipped 560 previous similar messages Jul 09 07:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 07:39:39 fir-md1-s1 kernel: Lustre: Skipped 503 previous similar messages Jul 09 07:47:12 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:47:12 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 09 07:47:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 07:47:18 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 09 07:47:19 fir-md1-s1 kernel: Lustre: 21446:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22e2333f00 x1631660325794608/t0(0) o101->b7aae4ae-1aa0-9e5d-5ecf-90e4dbcd33de@10.9.101.27@o2ib4:24/0 lens 480/568 e 1 to 0 dl 1562683644 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:47:19 fir-md1-s1 kernel: Lustre: 21446:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 171 previous similar messages Jul 09 07:47:22 fir-md1-s1 kernel: Lustre: 23759:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f91a1fb00 x1638235634307120/t0(0) o101->8effb155-901a-a135-30ea-62c11eaaf5e4@10.9.101.55@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1562683647 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:47:22 fir-md1-s1 kernel: Lustre: 23759:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 09 07:47:28 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:47:32 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2d32646900 x1633919152572352/t0(0) o101->b731aa74-f761-f808-ac4e-60997bf2bd97@10.9.101.51@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1562683657 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:47:32 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 09 07:47:33 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:47:41 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:47:54 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:48:02 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:49:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 07:49:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 07:50:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 07:50:01 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 07:50:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 07:50:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 07:51:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 83681611-079a-5f4e-8864-a59fd70f2c12 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b5ef8f400, cur 1562683891 expire 1562683741 last 1562683664 Jul 09 07:51:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 07:51:33 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:51:33 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 09 07:51:42 fir-md1-s1 kernel: Lustre: 21680:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0e65a21800 x1634132356673712/t0(0) o101->05133d08-3c30-bc0b-3005-cf52634e4b28@10.9.101.47@o2ib4:17/0 lens 480/568 e 0 to 0 dl 1562683907 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 07:52:09 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 07:52:09 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 09 07:55:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc3fb400, cur 1562684109 expire 1562683959 last 1562683882 Jul 09 07:55:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 07:57:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 07:57:26 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 08:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 08:00:02 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 09 08:00:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 08:00:31 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 09 08:09:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 08:09:39 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 09 08:10:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 08:10:29 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 09 08:10:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 08:10:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 08:13:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:13:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 08:18:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e04e13800, cur 1562685481 expire 1562685331 last 1562685254 Jul 09 08:18:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:18:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 08:21:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 08:21:03 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 09 08:21:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 08:21:03 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 09 08:21:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 08:21:07 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 08:24:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:30:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:30:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 08:31:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 08:31:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 08:31:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 08:31:07 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 09 08:31:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 08:31:11 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 08:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f290631c000, cur 1562686693 expire 1562686543 last 1562686466 Jul 09 08:41:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:41:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 08:41:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 08:41:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 08:41:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 08:41:12 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 09 08:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 907cc100-42d3-4f58-47b7-3f525e8fafee (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ecfc62400, cur 1562686886 expire 1562686736 last 1562686659 Jul 09 08:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 08:41:58 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 08:51:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 08:51:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 08:51:21 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 08:51:21 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 09 08:53:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 08:53:08 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 08:54:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 08:54:58 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 09:01:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 09:01:37 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 09:03:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 09:03:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 09 09:03:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 09:03:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 09:11:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 09:11:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 09:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 09:11:53 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 09 09:13:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 09:13:42 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 09:14:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 09:14:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 09:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 09:21:58 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 09 09:24:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 09:24:08 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 09:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 09:24:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 09:26:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 09:26:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 09:32:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 09:32:18 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 09 09:34:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 09:34:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 09:35:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 09 09:35:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 09:36:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 09:36:50 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 09 09:42:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 09:42:25 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 09:45:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 09:45:22 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 09:45:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 09:45:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 09:48:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 09:48:04 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 09 09:52:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 09:52:32 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 09 09:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 09:55:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 09:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 09:57:33 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 10:01:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:01:22 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 09 10:03:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 10:03:00 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 09 10:05:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 10:05:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 10:07:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 10:07:42 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 10:11:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:11:25 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 09 10:13:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 10:13:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 09 10:16:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 10:16:11 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 10:18:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 10:18:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 10:23:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 10:23:37 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 09 10:24:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:24:36 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 09 10:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 10:26:12 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 10:28:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 10:28:47 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 09 10:34:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 10:34:16 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 09 10:36:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 10:36:18 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 10:36:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:36:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 10:40:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 10:40:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 10:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f266a9c7400, cur 1562694042 expire 1562693892 last 1562693815 Jul 09 10:40:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 10:44:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 10:44:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 10:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 10:46:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 10:46:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:46:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 10:50:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 10:50:43 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 09 10:54:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 10:54:52 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 09 10:56:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 10:56:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 10:59:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 10:59:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 11:00:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:00:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 09 11:04:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 11:04:58 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 09 11:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 11:06:39 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 11:10:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:10:46 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 11:12:45 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:12:45 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 09 11:12:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:12:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 18 previous similar messages Jul 09 11:13:11 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 09 11:13:11 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 09 11:13:44 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 09 11:13:44 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 64 previous similar messages Jul 09 11:14:49 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:14:49 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 146 previous similar messages Jul 09 11:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 11:14:58 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 09 11:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 11:16:54 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 11:16:58 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:16:58 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 287 previous similar messages Jul 09 11:17:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 11:17:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 11:20:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:20:53 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 11:21:15 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:21:15 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 610 previous similar messages Jul 09 11:25:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 11:25:02 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 09 11:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 11:27:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 11:29:52 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:29:52 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1230 previous similar messages Jul 09 11:30:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:30:59 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 09 11:31:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 11:31:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 11:35:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 11:35:02 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 11:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 11:37:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 11:39:52 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 155648 GRANT, real grant 0 Jul 09 11:39:52 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2374 previous similar messages Jul 09 11:41:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:41:07 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 11:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 11:45:07 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 09 11:45:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 11:45:38 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 11:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 11:49:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 11:49:53 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 09 11:49:53 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1286 previous similar messages Jul 09 11:51:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 11:51:32 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 09 11:55:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 11:55:13 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 09 11:55:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 11:55:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 11:59:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 11:59:47 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 11:59:55 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 11:59:55 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1188 previous similar messages Jul 09 12:02:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 12:02:23 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 12:05:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 12:05:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 09 12:06:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 12:06:14 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 09 12:10:00 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 09 12:10:00 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1195 previous similar messages Jul 09 12:10:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 12:10:11 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 12:12:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 12:12:32 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 09 12:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 12:15:35 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 09 12:20:01 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 09 12:20:01 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1117 previous similar messages Jul 09 12:20:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 12:20:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 12:21:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 12:21:40 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 12:22:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 12:22:37 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 09 12:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 12:25:35 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 09 12:30:02 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 12:30:02 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1160 previous similar messages Jul 09 12:30:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 12:30:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 12:32:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 12:32:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 12:33:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 12:33:20 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 09 12:35:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 12:35:42 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 09 12:40:06 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 12:40:06 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1113 previous similar messages Jul 09 12:40:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 12:40:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 12:44:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 12:44:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 12:44:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 12:44:58 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 09 12:46:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 12:46:02 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 09 12:50:25 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 12:50:25 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1208 previous similar messages Jul 09 12:51:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 12:51:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 12:55:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 12:55:01 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 12:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 12:56:04 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 12:56:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 12:56:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 13:00:30 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:00:30 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1185 previous similar messages Jul 09 13:01:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 13:01:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 13:05:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 13:05:30 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 09 13:06:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 13:06:23 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 09 13:07:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 13:07:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 13:10:31 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:10:31 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1147 previous similar messages Jul 09 13:11:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 13:11:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 09 13:15:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 13:15:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 09 13:16:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 13:16:36 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 09 13:20:34 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:20:34 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1186 previous similar messages Jul 09 13:21:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 13:21:48 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 13:22:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 13:22:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 13:26:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 13:26:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 09 13:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 13:26:41 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 13:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27ce1a3000, cur 1562704177 expire 1562704027 last 1562703950 Jul 09 13:30:52 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:30:52 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1203 previous similar messages Jul 09 13:33:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 13:33:29 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 13:33:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 13:33:31 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 13:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f164abf1400, cur 1562704597 expire 1562704447 last 1562704370 Jul 09 13:36:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 13:36:43 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 09 13:37:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 13:37:01 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 13:40:53 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:40:53 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1106 previous similar messages Jul 09 13:43:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 13:43:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 09 13:44:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 13:44:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 13:46:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 13:46:46 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 09 13:47:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 13:47:13 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 09 13:50:54 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 13:50:54 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1140 previous similar messages Jul 09 13:53:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 13:53:52 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 09 13:56:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 13:56:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 13:56:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 13:56:47 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 09 13:58:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 13:58:31 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 09 14:00:55 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 14:00:55 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1173 previous similar messages Jul 09 14:03:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 14:03:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 14:07:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 14:07:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 14:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 14:07:21 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 09 14:08:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 14:08:37 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 14:10:55 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 14:10:55 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1213 previous similar messages Jul 09 14:14:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 14:14:11 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 14:17:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 14:17:21 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 09 14:17:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25227b4c00, cur 1562707070 expire 1562706920 last 1562706843 Jul 09 14:18:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 14:18:42 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 09 14:20:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 14:20:31 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 09 14:20:58 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 09 14:20:58 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1264 previous similar messages Jul 09 14:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 14:24:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 14:27:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 14:27:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 09 14:29:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 14:29:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 14:31:02 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 14:31:02 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1353 previous similar messages Jul 09 14:31:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 14:31:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 14:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 14:34:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 14:37:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 14:37:56 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 09 14:40:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 14:40:18 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 14:41:05 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 14:41:05 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1368 previous similar messages Jul 09 14:42:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 14:42:00 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 14:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 14:44:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 14:48:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 14:48:08 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 09 14:50:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 14:50:49 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 14:51:06 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 14:51:06 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1142 previous similar messages Jul 09 14:53:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 14:53:01 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 09 14:55:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 14:55:16 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 09 14:58:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 14:58:09 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 15:01:20 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 15:01:20 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1215 previous similar messages Jul 09 15:02:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 15:02:44 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 15:04:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 15:04:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 15:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 15:05:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 15:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 15:08:14 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 09 15:11:21 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 15:11:21 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1307 previous similar messages Jul 09 15:13:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 15:13:26 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 09 15:16:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 15:16:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 09 15:18:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 15:18:15 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 09 15:18:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 15:18:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 15:21:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ed0048c0-7f49-6510-9744-70056c2a3965 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3228e51800, cur 1562710871 expire 1562710721 last 1562710644 Jul 09 15:21:22 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 15:21:22 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1291 previous similar messages Jul 09 15:23:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 15:23:36 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 15:26:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 15:26:57 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 09 15:28:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 15:28:19 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 09 15:29:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 15:29:44 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 09 15:31:26 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 15:31:26 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1297 previous similar messages Jul 09 15:34:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 15:34:10 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 09 15:36:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 15:36:59 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 15:38:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 15:38:23 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 09 15:41:28 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 15:41:28 fir-md1-s1 kernel: LustreError: 21496:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1250 previous similar messages Jul 09 15:42:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 15:42:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 15:45:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 15:45:05 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 09 15:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 15:47:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 09 15:48:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 15:48:26 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 09 15:51:35 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 15:51:35 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1355 previous similar messages Jul 09 15:52:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 15:55:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 15:55:59 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 09 15:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 15:57:49 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 15:58:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 15:58:42 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 09 16:01:35 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 16:01:35 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1204 previous similar messages Jul 09 16:05:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 16:05:18 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 16:06:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 16:06:32 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 09 16:07:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 16:07:49 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 09 16:08:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 16:08:47 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 09 16:11:44 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 16:11:44 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1300 previous similar messages Jul 09 16:16:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 16:16:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 16:17:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 16:17:03 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 16:17:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17acad8c00, cur 1562714237 expire 1562714087 last 1562714010 Jul 09 16:17:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 09 16:17:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 16:17:55 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 16:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 16:18:48 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 09 16:21:46 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 16:21:46 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1553 previous similar messages Jul 09 16:27:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 16:27:32 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 09 16:28:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 16:28:00 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 16:28:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 16:28:12 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 09 16:29:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 16:29:16 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 09 16:32:07 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 16:32:07 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1571 previous similar messages Jul 09 16:38:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 16:38:07 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 16:39:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 09 16:39:17 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 09 16:39:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 16:39:40 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 16:42:14 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 16:42:14 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1216 previous similar messages Jul 09 16:43:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 16:43:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 16:48:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 16:48:13 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 16:49:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 16:49:18 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 09 16:50:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 16:50:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 09 16:52:16 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 16:52:16 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1406 previous similar messages Jul 09 16:54:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 16:54:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 16:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 16:58:28 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 16:59:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 16:59:41 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 09 17:00:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 17:00:26 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 09 17:02:18 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 17:02:18 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1207 previous similar messages Jul 09 17:06:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 17:06:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 17:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 17:08:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 17:09:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 17:09:43 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 09 17:10:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 17:10:34 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 17:12:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 17:12:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1453 previous similar messages Jul 09 17:19:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 17:19:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 17:19:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 17:19:34 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 09 17:19:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 17:19:47 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 17:22:24 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 17:22:24 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1219 previous similar messages Jul 09 17:22:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 17:22:42 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 09 17:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 17:29:41 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 09 17:29:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 17:29:51 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 09 17:30:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 17:30:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 17:32:25 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 17:32:25 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1329 previous similar messages Jul 09 17:32:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 17:32:44 fir-md1-s1 kernel: Lustre: Skipped 122 previous similar messages Jul 09 17:39:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 17:39:55 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 09 17:39:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 17:39:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 17:40:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 17:40:52 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 17:42:29 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 17:42:29 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1126 previous similar messages Jul 09 17:44:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 17:44:53 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 17:49:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 17:49:56 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 17:49:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 09 17:49:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 17:52:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 17:52:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 17:52:30 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 147456 GRANT, real grant 0 Jul 09 17:52:30 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1159 previous similar messages Jul 09 17:54:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 17:54:58 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 09 17:59:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 17:59:59 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 17:59:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 17:59:59 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 09 18:02:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:02:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1412 previous similar messages Jul 09 18:04:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:04:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 18:05:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 18:05:20 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 18:10:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 18:10:00 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 09 18:10:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 18:10:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 18:12:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:12:35 fir-md1-s1 kernel: LustreError: 46515:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1240 previous similar messages Jul 09 18:16:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 18:16:32 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 09 18:20:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 18:20:05 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 18:20:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 18:20:05 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 18:22:44 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:22:44 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1388 previous similar messages Jul 09 18:25:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:25:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 18:27:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:29:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 18:29:27 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 18:30:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 18:30:08 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 09 18:30:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 18:30:11 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 18:31:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:31:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 18:32:46 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:32:46 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1199 previous similar messages Jul 09 18:39:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 18:39:28 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 09 18:39:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:39:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 18:40:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 18:40:09 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 09 18:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 18:40:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 18:42:53 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:42:53 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1471 previous similar messages Jul 09 18:49:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 18:49:28 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 09 18:50:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 18:50:23 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 09 18:51:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 18:51:01 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 18:52:55 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 18:52:55 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1794 previous similar messages Jul 09 18:52:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 18:52:58 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 19:00:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 19:00:12 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 09 19:00:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 19:00:23 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 09 19:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 19:01:07 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 09 19:02:56 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 19:02:56 fir-md1-s1 kernel: LustreError: 46585:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1154 previous similar messages Jul 09 19:04:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 19:04:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 19:10:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 19:10:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 09 19:10:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 19:10:29 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 09 19:11:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 19:11:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 19:12:59 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 09 19:12:59 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1240 previous similar messages Jul 09 19:16:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 19:16:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 19:20:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 19:20:30 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 09 19:20:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 19:20:30 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 19:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 19:21:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 09 19:23:07 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 19:23:07 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1383 previous similar messages Jul 09 19:25:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c1f2b1000, cur 1562725502 expire 1562725352 last 1562725275 Jul 09 19:27:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 19:27:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 19:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 19:30:46 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 09 19:30:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 19:30:57 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 09 19:31:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 19:31:45 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 19:33:13 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 19:33:13 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1217 previous similar messages Jul 09 19:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 19:40:52 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 09 19:41:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 19:41:45 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 19:42:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 19:42:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 19:42:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 09 19:42:56 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 09 19:43:18 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 19:43:18 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1503 previous similar messages Jul 09 19:49:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dc7bf7c00, cur 1562726991 expire 1562726841 last 1562726764 Jul 09 19:50:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 19:50:56 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 09 19:52:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 19:52:19 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 09 19:53:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 19:53:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 19:53:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 09 19:53:13 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 19:53:27 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 19:53:27 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1171 previous similar messages Jul 09 20:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 20:01:13 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 09 20:02:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 20:02:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 20:03:33 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:03:33 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1553 previous similar messages Jul 09 20:04:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2de89fb000, cur 1562727884 expire 1562727734 last 1562727657 Jul 09 20:05:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 20:05:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 20:05:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:05:24 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 09 20:09:42 fir-md1-s1 kernel: Lustre: 46570:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c050bc850 x1631544738553792/t0(0) o4->d4206ce1-9dd3-fa31-a867-02061bc7b726@10.9.107.34@o2ib4:17/0 lens 2936/448 e 1 to 0 dl 1562728187 ref 2 fl Interpret:/0/0 rc 0/0 Jul 09 20:09:42 fir-md1-s1 kernel: Lustre: 46570:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jul 09 20:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 20:11:15 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 20:12:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 20:12:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 20:13:37 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:13:37 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1201 previous similar messages Jul 09 20:15:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:15:25 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 09 20:15:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 20:15:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 20:21:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 20:21:19 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 09 20:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 20:23:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 20:23:38 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:23:38 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1305 previous similar messages Jul 09 20:26:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:26:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 20:27:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 20:27:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 20:31:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 20:31:23 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 20:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 20:33:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 09 20:33:44 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:33:44 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1406 previous similar messages Jul 09 20:36:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:36:10 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 09 20:38:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 20:38:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 20:42:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 20:42:18 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 20:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 20:43:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 20:43:48 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:43:48 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1220 previous similar messages Jul 09 20:46:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:46:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 20:49:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 20:49:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 20:52:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 20:52:32 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 09 20:53:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 20:53:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 20:53:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 20:53:54 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1412 previous similar messages Jul 09 20:56:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 20:56:34 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 21:02:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 21:02:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 21:02:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 21:02:34 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 09 21:03:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 21:03:54 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 21:04:02 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 21:04:02 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1208 previous similar messages Jul 09 21:06:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 21:06:55 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 09 21:12:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 21:12:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 09 21:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 21:13:58 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 21:14:02 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 21:14:02 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1370 previous similar messages Jul 09 21:17:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 21:17:34 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 21:22:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 21:22:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 21:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 21:22:39 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 09 21:24:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 21:24:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 21:24:07 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 21:24:07 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1263 previous similar messages Jul 09 21:27:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 21:27:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 09 21:32:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 21:32:43 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 09 21:34:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 21:34:05 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 09 21:34:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 21:34:15 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 09 21:34:16 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 09 21:34:16 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1296 previous similar messages Jul 09 21:38:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 21:38:06 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 21:42:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 21:42:47 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 21:44:18 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 21:44:18 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1331 previous similar messages Jul 09 21:44:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 21:44:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 21:45:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 21:45:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 21:50:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 21:50:00 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 09 21:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 21:52:58 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 09 21:54:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 21:54:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1662 previous similar messages Jul 09 21:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 21:54:49 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 21:55:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 21:55:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 21:55:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 61ae4453-9148-72f5-b1f3-a11de36a336a (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f418a400, cur 1562734553 expire 1562734403 last 1562734326 Jul 09 21:56:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3b5811cb-a5ba-651b-84fc-da5d7c08aeef (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f192e716c00, cur 1562734561 expire 1562734411 last 1562734334 Jul 09 22:00:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:00:27 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 22:03:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 22:03:27 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 09 22:04:22 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:04:22 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1382 previous similar messages Jul 09 22:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 22:05:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 09 22:06:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 22:06:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 09 22:10:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:10:35 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 09 22:13:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 22:13:29 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 09 22:14:30 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:14:30 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1367 previous similar messages Jul 09 22:15:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 22:15:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 09 22:16:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 22:16:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 22:21:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:21:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 09 22:23:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 22:23:31 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 09 22:24:30 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:24:30 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1232 previous similar messages Jul 09 22:26:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 09 22:26:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 22:29:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 22:29:09 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 22:31:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:31:32 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 09 22:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 09 22:33:36 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 09 22:34:35 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:34:35 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1220 previous similar messages Jul 09 22:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 22:36:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 22:40:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 22:40:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 09 22:41:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:41:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 09 22:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 22:43:52 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 09 22:44:35 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:44:35 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1450 previous similar messages Jul 09 22:46:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 09 22:46:31 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 22:51:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 22:51:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 22:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 09 22:53:56 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 09 22:54:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 22:54:10 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 09 22:54:38 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 22:54:38 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1215 previous similar messages Jul 09 22:56:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 22:56:38 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 09 23:03:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 23:03:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 23:04:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 23:04:13 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 09 23:04:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 23:04:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 09 23:04:45 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:04:45 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1182 previous similar messages Jul 09 23:06:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 23:06:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 09 23:14:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 09 23:14:17 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 09 23:14:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 23:14:24 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 09 23:14:47 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:14:47 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1224 previous similar messages Jul 09 23:15:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 23:15:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 09 23:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 23:17:18 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 09 23:19:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3f11760000, cur 1562739594 expire 1562739444 last 1562739367 Jul 09 23:19:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 09 23:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 23:24:20 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 09 23:24:49 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:24:49 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1457 previous similar messages Jul 09 23:25:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 23:25:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 23:26:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 23:26:44 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 09 23:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 23:27:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 09 23:34:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 23:34:22 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 09 23:34:51 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:34:51 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1144 previous similar messages Jul 09 23:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 23:36:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 09 23:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 23:37:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 09 23:38:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 23:38:56 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 23:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 23:44:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 09 23:44:54 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:44:54 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1534 previous similar messages Jul 09 23:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 09 23:48:02 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 09 23:48:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 09 23:48:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 09 23:49:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 09 23:49:05 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 09 23:54:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 09 23:54:27 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 09 23:55:02 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 09 23:55:02 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1260 previous similar messages Jul 09 23:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 09 23:58:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 09 23:59:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 09 23:59:28 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 00:03:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 00:03:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 00:04:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 00:04:28 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 00:05:15 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:05:15 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1344 previous similar messages Jul 10 00:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 00:08:14 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 00:09:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 00:09:34 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 10 00:14:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 00:14:02 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 00:14:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 00:14:34 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 00:15:16 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:15:16 fir-md1-s1 kernel: LustreError: 44036:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1363 previous similar messages Jul 10 00:18:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 00:18:51 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 00:21:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 00:21:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 00:24:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 00:24:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 10 00:25:28 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:25:28 fir-md1-s1 kernel: LustreError: 21454:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1342 previous similar messages Jul 10 00:26:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 00:26:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 00:29:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 00:29:01 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 10 00:32:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 00:32:46 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 00:34:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 00:34:35 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 10 00:35:28 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:35:28 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1376 previous similar messages Jul 10 00:36:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 00:36:11 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 10 00:39:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 00:39:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 00:43:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 00:43:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 00:44:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 00:44:49 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 10 00:45:36 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:45:36 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1223 previous similar messages Jul 10 00:49:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 00:49:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 00:53:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 00:53:28 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 10 00:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f8ad0c800, cur 1562745223 expire 1562745073 last 1562744996 Jul 10 00:54:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 00:54:53 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 10 00:55:40 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 00:55:40 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1270 previous similar messages Jul 10 00:59:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 00:59:48 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 01:04:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 01:04:25 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 01:04:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 01:04:54 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 01:05:54 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 10 01:05:54 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1264 previous similar messages Jul 10 01:10:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:10:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 01:10:19 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 01:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:12:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 01:14:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 01:14:31 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 10 01:14:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 01:14:59 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 10 01:15:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:16:04 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 01:16:04 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1378 previous similar messages Jul 10 01:20:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 01:20:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 01:24:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:24:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 01:24:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 01:24:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 10 01:25:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 01:25:03 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 10 01:26:05 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 01:26:05 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1282 previous similar messages Jul 10 01:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 01:31:09 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 01:34:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:34:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 01:35:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 01:35:03 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 10 01:35:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 01:35:03 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 01:36:08 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 10 01:36:08 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1274 previous similar messages Jul 10 01:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 01:41:25 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 10 01:44:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:44:42 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 01:45:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 01:45:04 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 10 01:46:10 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 01:46:10 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1440 previous similar messages Jul 10 01:46:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 01:46:19 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 01:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 01:51:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 01:54:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 01:54:51 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 01:55:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 01:55:08 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 10 01:56:15 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 01:56:15 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1302 previous similar messages Jul 10 01:56:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 01:56:24 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 10 02:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 02:01:56 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 02:06:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 02:06:00 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 10 02:06:23 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 02:06:23 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1413 previous similar messages Jul 10 02:06:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 02:06:29 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 02:09:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 02:09:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 02:12:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 02:12:19 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 02:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 02:16:28 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 10 02:16:35 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 32768 GRANT, real grant 0 Jul 10 02:16:35 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 910 previous similar messages Jul 10 02:17:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 02:17:51 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 10 02:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 02:22:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 02:23:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 02:23:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 02:26:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 02:26:33 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 10 02:26:36 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 02:26:36 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 746 previous similar messages Jul 10 02:27:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 02:27:55 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 02:32:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 02:32:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 02:36:37 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 02:36:37 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 624 previous similar messages Jul 10 02:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 02:36:48 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 10 02:39:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 02:39:42 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 02:42:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 02:42:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 02:43:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24208bec00, cur 1562751820 expire 1562751670 last 1562751593 Jul 10 02:46:40 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 02:46:40 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 721 previous similar messages Jul 10 02:46:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ccef14c00, cur 1562752003 expire 1562751853 last 1562751776 Jul 10 02:46:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 02:46:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 02:50:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 02:50:41 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 10 02:52:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 02:52:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 10 02:52:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 02:52:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 02:53:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 02:53:35 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 10 02:56:41 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 02:56:41 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 776 previous similar messages Jul 10 02:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 02:57:24 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 02:57:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 03:03:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 03:03:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 03:03:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 03:03:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 03:06:44 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 03:06:44 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 824 previous similar messages Jul 10 03:07:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 03:07:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 03:07:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 03:07:51 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 10 03:13:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 03:13:07 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 10 03:14:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 03:14:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 03:16:45 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 03:16:45 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 554 previous similar messages Jul 10 03:17:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 03:17:53 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 10 03:21:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 03:21:08 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 03:23:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 03:23:24 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 10 03:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 03:24:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 03:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f340038f-9ca0-b54d-f024-5ea93ca12997 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f41e39000, cur 1562754351 expire 1562754201 last 1562754124 Jul 10 03:26:48 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 03:26:48 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 303 previous similar messages Jul 10 03:28:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 03:28:07 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 03:33:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 03:33:40 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 03:34:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 03:34:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 03:37:06 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 03:37:06 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 327 previous similar messages Jul 10 03:38:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 03:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 03:38:08 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 10 03:38:08 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 03:43:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 03:43:42 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 03:44:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 03:44:37 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 10 03:47:09 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 03:47:09 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 506 previous similar messages Jul 10 03:48:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 03:48:14 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 10 03:50:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 03:50:05 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 03:55:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 03:55:19 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 03:55:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 03:55:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 10 03:57:18 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 03:57:18 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 615 previous similar messages Jul 10 03:58:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 03:58:14 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 10 04:05:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 04:05:28 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 04:05:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 04:05:36 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 04:06:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 04:06:09 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 04:07:22 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 04:07:22 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 532 previous similar messages Jul 10 04:08:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 04:08:14 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 04:15:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 04:15:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 04:16:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 04:16:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 04:17:24 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 04:17:24 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 675 previous similar messages Jul 10 04:18:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 04:18:20 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 10 04:25:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 04:25:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 04:25:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 04:25:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 04:27:26 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 04:27:26 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 715 previous similar messages Jul 10 04:28:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 04:28:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 04:28:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 04:28:21 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 10 04:35:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 04:35:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 04:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 04:36:20 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 10 04:37:28 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 04:37:28 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 746 previous similar messages Jul 10 04:38:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 04:38:10 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 10 04:38:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 04:38:22 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 10 04:39:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 81ca2f92-1c99-e8fa-d30d-6f44b638b624 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f16a61000, cur 1562758780 expire 1562758630 last 1562758553 Jul 10 04:39:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 04:46:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 04:46:48 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 04:47:29 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 04:47:29 fir-md1-s1 kernel: LustreError: 57558:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 538 previous similar messages Jul 10 04:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 04:48:22 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 04:49:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 04:49:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 04:57:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 04:57:01 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 04:57:29 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 04:57:29 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 468 previous similar messages Jul 10 04:58:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 04:58:23 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 10 04:58:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 04:58:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 04:59:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 04:59:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 10 05:00:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:00:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 05:04:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:04:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 05:07:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 05:07:04 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 10 05:07:53 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 05:07:53 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 288 previous similar messages Jul 10 05:08:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 05:08:28 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 10 05:09:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 05:09:26 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 10 05:17:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 05:17:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 05:17:56 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 05:17:56 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 480 previous similar messages Jul 10 05:18:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 05:18:29 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 10 05:19:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 05:19:27 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 10 05:20:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:20:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 05:25:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:27:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 05:27:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 05:27:56 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 05:27:56 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 476 previous similar messages Jul 10 05:28:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 05:28:36 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 10 05:29:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:29:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 05:29:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 05:29:29 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 10 05:34:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:37:58 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 05:37:58 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 658 previous similar messages Jul 10 05:38:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 05:38:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 05:38:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 05:38:42 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 10 05:40:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 05:40:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 05:42:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:47:59 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 05:47:59 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 568 previous similar messages Jul 10 05:48:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 05:48:00 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 05:48:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 05:48:57 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 10 05:50:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 05:50:12 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 10 05:56:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 05:56:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 05:58:01 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 05:58:01 fir-md1-s1 kernel: LustreError: 20500:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 696 previous similar messages Jul 10 05:58:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 05:58:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 05:59:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 05:59:03 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 10 06:00:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 06:00:23 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 06:08:06 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 06:08:06 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 759 previous similar messages Jul 10 06:08:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 06:08:17 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 06:09:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 06:09:27 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 10 06:10:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 06:10:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 06:14:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 06:14:32 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 06:18:08 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 06:18:08 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 580 previous similar messages Jul 10 06:18:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 06:18:34 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 06:19:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 06:19:39 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 06:20:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 06:20:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 10 06:28:16 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 06:28:16 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 528 previous similar messages Jul 10 06:28:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 06:28:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 06:29:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 06:29:51 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 06:30:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 06:30:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 06:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 06:32:39 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 10 06:38:18 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f100ed100 x1638075991861936/t0(0) o101->b041cef5-fff9-4fc6-cc5f-62c5a80e124b@10.9.0.81@o2ib4:23/0 lens 480/568 e 1 to 0 dl 1562765903 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 06:38:25 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 06:38:25 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 548 previous similar messages Jul 10 06:38:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b041cef5-fff9-4fc6-cc5f-62c5a80e124b (at 10.9.0.81@o2ib4) reconnecting Jul 10 06:38:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 06:39:09 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1ad960b600 x1638276144880736/t0(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:14/0 lens 512/2888 e 0 to 0 dl 1562765954 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 06:39:33 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562765883, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f212eaf5a00/0x5d9ee638bed62f8c lrc: 3/0,1 mode: --/PW res: [0x200029cf7:0x9d:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23704 timeout: 0 lvb_type: 0 Jul 10 06:39:40 fir-md1-s1 kernel: Lustre: 10582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-19), not sending early reply req@ffff8f2ecce84800 x1638276144883648/t0(0) o101->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:15/0 lens 576/3264 e 0 to 0 dl 1562765985 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 06:40:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 06:40:08 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 10 06:40:14 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562765924, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f24f26e8480/0x5d9ee638bef3c93d lrc: 3/0,1 mode: --/EX res: [0x200029cf7:0x9d:0x0].0x0 bits 0x3/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22283 timeout: 0 lvb_type: 0 Jul 10 06:40:31 fir-md1-s1 kernel: LustreError: 50582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562765941, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0ab91d5e80/0x5d9ee638befea93e lrc: 3/1,0 mode: --/PR res: [0x200025ce2:0x1fa1:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50582 timeout: 0 lvb_type: 0 Jul 10 06:40:32 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.0.81@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f22955086c0/0x5d9ee638bed60c69 lrc: 3/0,0 mode: PR/PR res: [0x200029cf7:0x9d:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.0.81@o2ib4 remote: 0x483a08d1111db65 expref: 46 pid: 21430 timeout: 1881092 lvb_type: 0 Jul 10 06:40:32 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 10 06:40:32 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f20269e5800 ns: mdt-fir-MDT0000_UUID lock: ffff8f212eaf5a00/0x5d9ee638bed62f8c lrc: 3/0,0 mode: PW/PW res: [0x200029cf7:0x9d:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.9.0.81@o2ib4 remote: 0x483a08d1111db73 expref: 33 pid: 23704 timeout: 0 lvb_type: 0 Jul 10 06:44:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b041cef5-fff9-4fc6-cc5f-62c5a80e124b (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f37b3b6c800, cur 1562766259 expire 1562766109 last 1562766032 Jul 10 06:44:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 06:44:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 06:44:48 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 06:48:34 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 06:48:34 fir-md1-s1 kernel: LustreError: 21996:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 448 previous similar messages Jul 10 06:48:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 06:48:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 06:50:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 06:50:21 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 10 06:55:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 06:55:26 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 10 06:58:36 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 06:58:36 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 627 previous similar messages Jul 10 06:59:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 06:59:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 07:00:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 07:00:28 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 07:03:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:03:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 07:04:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:05:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 07:05:55 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 07:08:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:08:43 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 07:08:43 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 596 previous similar messages Jul 10 07:09:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 07:09:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 07:10:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 07:10:29 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 07:17:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:18:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 07:18:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 10 07:18:45 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 07:18:45 fir-md1-s1 kernel: LustreError: 46577:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 607 previous similar messages Jul 10 07:19:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 07:19:25 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 07:20:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 07:20:44 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 07:27:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:27:41 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 07:28:47 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 07:28:47 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 594 previous similar messages Jul 10 07:29:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 07:29:43 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 10 07:30:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 07:30:01 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 10 07:30:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 07:30:51 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 07:38:49 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 07:38:49 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 700 previous similar messages Jul 10 07:39:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 07:39:44 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 10 07:40:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 07:40:10 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 07:41:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 07:41:01 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 10 07:42:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:42:42 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 10 07:48:55 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 07:48:55 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jul 10 07:50:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 07:50:11 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 07:51:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 07:51:16 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 07:52:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 07:52:36 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 10 07:55:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 07:55:00 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 07:58:56 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 07:58:56 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 486 previous similar messages Jul 10 07:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2bad2d99-434c-c071-8f86-46075da8e78f (at 10.9.115.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0af8d72800, cur 1562770761 expire 1562770611 last 1562770534 Jul 10 08:01:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 08:01:12 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 08:01:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 08:01:29 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 08:03:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 08:03:08 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 10 08:05:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 08:05:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 08:09:07 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 08:09:07 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 464 previous similar messages Jul 10 08:12:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 08:12:18 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 08:12:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 08:12:18 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 08:13:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 08:13:33 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 10 08:15:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 08:15:52 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 10 08:19:09 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 32768 GRANT, real grant 0 Jul 10 08:19:09 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 436 previous similar messages Jul 10 08:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 08:22:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 08:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 08:22:35 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 08:23:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 08:23:41 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 10 08:26:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 08:26:42 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 08:29:16 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 08:29:16 fir-md1-s1 kernel: LustreError: 21995:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 558 previous similar messages Jul 10 08:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 08:32:50 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 08:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 08:32:50 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 10 08:36:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 08:36:28 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 10 08:36:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 08:36:49 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 10 08:39:23 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 08:39:23 fir-md1-s1 kernel: LustreError: 27580:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 604 previous similar messages Jul 10 08:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20a9c2a400, cur 1562773351 expire 1562773201 last 1562773124 Jul 10 08:42:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 08:43:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 08:43:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 08:43:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 08:43:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 08:47:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 08:47:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 10 08:48:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 08:48:45 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 08:49:25 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 08:49:25 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jul 10 08:53:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 08:53:26 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 08:53:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 08:53:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 08:57:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 08:57:02 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 10 08:59:27 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 08:59:27 fir-md1-s1 kernel: LustreError: 21385:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 502 previous similar messages Jul 10 09:00:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:00:51 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 09:02:33 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ef25f9e00 x1638088148657920/t0(0) o101->5282ca62-d94c-33fd-9d61-31ebcd98e0af@10.9.116.1@o2ib4:8/0 lens 576/3264 e 1 to 0 dl 1562774558 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 09:02:33 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 282 previous similar messages Jul 10 09:02:34 fir-md1-s1 kernel: Lustre: 21333:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ecb5a1b00 x1634138221307008/t0(0) o101->cfa699f5-5c9c-ea69-d701-26f52d68dba1@10.9.101.37@o2ib4:9/0 lens 328/0 e 1 to 0 dl 1562774559 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 10 09:02:34 fir-md1-s1 kernel: Lustre: 21333:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 288 previous similar messages Jul 10 09:02:35 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ec5385a00 x1636456297044992/t0(0) o101->05e7d18b-fd1f-bd0e-dca1-20091393d8f8@10.9.108.66@o2ib4:10/0 lens 576/0 e 1 to 0 dl 1562774560 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 10 09:02:35 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 139 previous similar messages Jul 10 09:02:37 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f38b7611200 x1636457995286064/t0(0) o101->da0ec9bf-1999-ba8d-5389-20d1ebbaa0f5@10.9.107.72@o2ib4:12/0 lens 576/3264 e 1 to 0 dl 1562774562 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 09:02:37 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 217 previous similar messages Jul 10 09:02:41 fir-md1-s1 kernel: Lustre: 21678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f38b7612100 x1631604556695008/t0(0) o101->7f8dc145-a081-da87-1da4-154358301486@10.9.108.1@o2ib4:16/0 lens 576/3264 e 1 to 0 dl 1562774566 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 09:02:41 fir-md1-s1 kernel: Lustre: 21678:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1108 previous similar messages Jul 10 09:02:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.81@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3692f3b840/0x5d9ee638b6b6626a lrc: 3/0,0 mode: PR/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 886 type: IBT flags: 0x60200400000020 nid: 10.9.0.81@o2ib4 remote: 0x483a08d1110c556 expref: 18 pid: 20554 timeout: 1889627 lvb_type: 0 Jul 10 09:02:48 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f26a7be8000 x1631558636133808/t0(0) o101->9c58438d-335a-1a4a-8b6e-0ac0b859df8d@10.8.12.23@o2ib6:8/0 lens 576/0 e 1 to 0 dl 1562774558 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 10 09:02:48 fir-md1-s1 kernel: Lustre: 23684:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f348ab51200 x1631558636133888/t0(0) o101->9c58438d-335a-1a4a-8b6e-0ac0b859df8d@10.8.12.23@o2ib6:8/0 lens 576/0 e 1 to 0 dl 1562774558 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 10 09:02:48 fir-md1-s1 kernel: LustreError: 20545:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.10.21@o2ib6: deadline 20:4s ago req@ffff8f21dc32f200 x1632260917013200/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:13/0 lens 576/0 e 0 to 0 dl 1562774563 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 10 09:02:48 fir-md1-s1 kernel: LustreError: 26254:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.10.21@o2ib6: deadline 20:4s ago req@ffff8f25e2c06000 x1632260917013168/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:13/0 lens 576/0 e 0 to 0 dl 1562774563 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 10 09:02:48 fir-md1-s1 kernel: LustreError: 20545:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 40 previous similar messages Jul 10 09:02:48 fir-md1-s1 kernel: LustreError: 26254:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 40 previous similar messages Jul 10 09:02:48 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1501 previous similar messages Jul 10 09:03:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 09:03:35 fir-md1-s1 kernel: Lustre: Skipped 669 previous similar messages Jul 10 09:03:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 09:03:47 fir-md1-s1 kernel: Lustre: Skipped 657 previous similar messages Jul 10 09:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b041cef5-fff9-4fc6-cc5f-62c5a80e124b (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f442670e400, cur 1562774810 expire 1562774660 last 1562774583 Jul 10 09:09:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 09:09:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 10 09:09:32 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 09:09:32 fir-md1-s1 kernel: LustreError: 21388:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 627 previous similar messages Jul 10 09:10:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:10:55 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 09:13:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 09:13:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 09:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 09:14:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 09:19:34 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 10 09:19:34 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 523 previous similar messages Jul 10 09:20:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 09:20:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 10 09:21:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:21:55 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 10 09:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 09:23:43 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 09:24:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 09:24:39 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 09:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96bed43a-b7c9-0e49-67fd-9247dc304082 (at 10.8.30.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2508cc6c00, cur 1562775927 expire 1562775777 last 1562775700 Jul 10 09:29:41 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 09:29:41 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 597 previous similar messages Jul 10 09:32:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 09:32:32 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 10 09:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 09:33:53 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 09:34:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:34:39 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 10 09:35:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 09:35:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 09:39:43 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 09:39:43 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 467 previous similar messages Jul 10 09:43:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 09:43:36 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 10 09:44:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 09:44:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 09:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23ea245400, cur 1562777120 expire 1562776970 last 1562776893 Jul 10 09:45:20 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 10 09:45:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:45:25 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 10 09:45:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 09:45:51 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 09:47:25 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16618f8c00, cur 1562777245 expire 1562777095 last 1562777018 Jul 10 09:49:45 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 09:49:45 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 565 previous similar messages Jul 10 09:53:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 09:53:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 10 09:54:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 48055890-ac7a-40c2-f14b-00e7fd6a0cc0 (at 10.8.22.30@o2ib6) Jul 10 09:54:07 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 10 09:55:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d75f4c800, cur 1562777749 expire 1562777599 last 1562777522 Jul 10 09:56:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 09:56:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 09:57:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 09:57:04 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 10 09:57:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d8d221000, cur 1562777827 expire 1562777677 last 1562777600 Jul 10 09:59:45 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 09:59:45 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 570 previous similar messages Jul 10 10:04:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 10:04:33 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 10 10:05:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 10:05:26 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 10 10:07:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 10:07:18 fir-md1-s1 kernel: LustreError: Skipped 14 previous similar messages Jul 10 10:07:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 10:07:20 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 10:09:47 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 10:09:47 fir-md1-s1 kernel: LustreError: 46513:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 714 previous similar messages Jul 10 10:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 10:14:46 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 10:17:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 10:17:08 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 10 10:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 10:17:27 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 10:18:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 10:18:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 10:19:50 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 10:19:50 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 708 previous similar messages Jul 10 10:19:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22a1520c00, cur 1562779193 expire 1562779043 last 1562778966 Jul 10 10:24:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 10:24:51 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 10:27:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 10:27:29 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 10 10:27:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 10:27:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 10:29:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 10:29:21 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 10 10:29:57 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 10:29:57 fir-md1-s1 kernel: LustreError: 46565:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 711 previous similar messages Jul 10 10:35:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 10:35:17 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 10:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 10:37:30 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 10:37:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 10:37:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 10 10:39:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 10:39:29 fir-md1-s1 kernel: LustreError: Skipped 16 previous similar messages Jul 10 10:40:09 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 10:40:09 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 656 previous similar messages Jul 10 10:45:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 10:45:20 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 10:47:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 10:47:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 10:48:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 10:48:44 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 10 10:50:11 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 10:50:11 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 651 previous similar messages Jul 10 10:52:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 10:52:21 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 10:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 10:55:21 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 10 10:58:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 10:58:01 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 10:58:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 10:58:49 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 10 11:00:20 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 11:00:20 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 713 previous similar messages Jul 10 11:02:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:02:52 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 10 11:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 11:05:36 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 11:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 11:08:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 11:09:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 11:09:45 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 10 11:10:20 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 11:10:20 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 583 previous similar messages Jul 10 11:14:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:14:10 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 10 11:15:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 11:15:56 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 11:19:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 11:19:17 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 11:20:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 11:20:20 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 624 previous similar messages Jul 10 11:20:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 11:20:49 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 10 11:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 11:26:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 10 11:28:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:28:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 11:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 11:29:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 11:30:21 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 11:30:21 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 548 previous similar messages Jul 10 11:31:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 11:31:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 10 11:36:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2efcb44000, cur 1562783774 expire 1562783624 last 1562783547 Jul 10 11:36:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 11:36:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 11:38:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:38:28 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 11:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 11:39:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 11:40:24 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 11:40:24 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 608 previous similar messages Jul 10 11:41:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 11:41:39 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 10 11:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f30f07c00, cur 1562784274 expire 1562784124 last 1562784047 Jul 10 11:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 11:46:35 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 10 11:48:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:48:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 11:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 11:49:33 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 11:50:29 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 11:50:29 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 627 previous similar messages Jul 10 11:51:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 11:51:45 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 10 11:53:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e912b1f0-4b75-614b-7c78-541b22033095 (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb52cac00, cur 1562784821 expire 1562784671 last 1562784594 Jul 10 11:56:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 11:56:36 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 11:59:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 11:59:34 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 11:59:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 11:59:55 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 12:00:31 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 12:00:31 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 737 previous similar messages Jul 10 12:02:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 12:02:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 10 12:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 12:06:49 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 10 12:09:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 55cf06b7-ada2-2c2a-4329-eb93e8b4cb23 (at 10.9.104.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19630e0000, cur 1562785766 expire 1562785616 last 1562785539 Jul 10 12:09:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 55cf06b7-ada2-2c2a-4329-eb93e8b4cb23 (at 10.9.104.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e17d3400, cur 1562785775 expire 1562785625 last 1562785548 Jul 10 12:09:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 12:10:34 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 12:10:34 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 751 previous similar messages Jul 10 12:10:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 12:10:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 12:12:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 12:12:23 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 12:13:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 12:13:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 12:17:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 12:17:03 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 12:20:40 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 12:20:40 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 693 previous similar messages Jul 10 12:20:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 12:20:49 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 12:22:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 12:22:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 12:23:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 12:23:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 12:27:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 12:27:39 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 10 12:30:42 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 12:30:42 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 673 previous similar messages Jul 10 12:30:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 12:30:50 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 10 12:33:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 12:33:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 12:34:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 12:34:05 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 10 12:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 12:37:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 10 12:40:48 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 12:40:48 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 624 previous similar messages Jul 10 12:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 12:40:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 12:44:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 12:44:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 12:45:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 12:45:10 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 12:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 12:47:51 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 10 12:50:49 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 12:50:49 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 609 previous similar messages Jul 10 12:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 12:52:51 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 10 12:55:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 12:55:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 12:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 12:57:53 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 12:58:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 12:58:42 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 10 13:00:53 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 13:00:53 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 544 previous similar messages Jul 10 13:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 13:02:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 13:05:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:05:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 13:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 13:07:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 13:08:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 13:08:49 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 13:10:55 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 13:10:55 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 619 previous similar messages Jul 10 13:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 13:13:05 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 13:15:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:15:30 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 10 13:18:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 13:18:18 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 10 13:18:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 13:18:57 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 10 13:21:01 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 10 13:21:01 fir-md1-s1 kernel: LustreError: 21987:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 601 previous similar messages Jul 10 13:23:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 13:23:12 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 13:26:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:26:00 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 10 13:28:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 13:28:51 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 10 13:29:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 13:29:56 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 13:31:06 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 13:31:06 fir-md1-s1 kernel: LustreError: 66901:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 761 previous similar messages Jul 10 13:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 13:33:23 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 10 13:36:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:36:02 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 10 13:38:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 13:38:53 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 10 13:41:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 13:41:14 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 631 previous similar messages Jul 10 13:43:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 13:43:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 13:44:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 13:44:38 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 10 13:46:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:46:12 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 10 13:49:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 13:49:03 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 10 13:51:15 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 13:51:15 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 790 previous similar messages Jul 10 13:53:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae14ed36-ba60-4740-8815-84f6adaeeb15 (at 10.9.114.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453e965800, cur 1562791993 expire 1562791843 last 1562791766 Jul 10 13:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 13:53:47 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 10 13:56:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 13:56:18 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 13:56:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 13:56:21 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 10 13:59:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 13:59:06 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 10 14:01:15 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:01:15 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 737 previous similar messages Jul 10 14:03:43 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562792616/real 1562792616] req@ffff8f2644ac4b00 x1636729522294720/t0(0) o104->fir-MDT0002@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562792623 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 14:03:50 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562792623/real 1562792623] req@ffff8f2644ac4b00 x1636729522294720/t0(0) o104->fir-MDT0002@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562792630 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 14:03:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ad7e9e1-dbc5-f9a1-bdd9-743173a51d0b (at 10.8.17.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17f9783400, cur 1562792631 expire 1562792481 last 1562792404 Jul 10 14:03:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 14:03:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 14:03:51 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 14:06:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 14:06:21 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 14:06:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 14:06:22 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 10 14:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 14:09:49 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 10 14:11:23 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:11:23 fir-md1-s1 kernel: LustreError: 22269:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 573 previous similar messages Jul 10 14:14:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 14:14:12 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 14:16:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 14:16:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 10 14:18:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 14:18:21 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 14:19:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 14:19:51 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 10 14:21:24 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:21:24 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 623 previous similar messages Jul 10 14:24:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 14:24:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 14:27:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 14:27:05 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 10 14:28:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 14:28:36 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 14:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 14:30:38 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 10 14:31:31 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:31:31 fir-md1-s1 kernel: LustreError: 20501:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 614 previous similar messages Jul 10 14:34:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 14:34:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 14:37:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 14:37:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 14:40:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 14:40:52 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 10 14:41:32 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:41:32 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 665 previous similar messages Jul 10 14:42:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 14:42:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 14:44:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 14:44:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 14:47:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 14:47:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 14:51:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 14:51:02 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 10 14:51:34 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 14:51:34 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 593 previous similar messages Jul 10 14:54:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 14:54:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 14:57:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 14:57:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 14:59:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 14:59:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 10 15:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 15:01:08 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 10 15:01:36 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:01:36 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 687 previous similar messages Jul 10 15:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 15:05:00 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 15:09:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 15:09:23 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 10 15:11:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 15:11:09 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 10 15:11:37 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:11:37 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 658 previous similar messages Jul 10 15:14:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 15:14:45 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 15:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e097a71c-e88a-824e-4bcb-410d766486a5 (at 10.9.114.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fbf52400, cur 1562796886 expire 1562796736 last 1562796659 Jul 10 15:14:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 15:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 15:15:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 15:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c50a2569-5f68-c0c4-a8b8-bfb61fe4dbbb (at 10.9.114.5@o2ib4) in 215 seconds. I think it's dead, and I am evicting it. exp ffff8f453868f000, cur 1562796962 expire 1562796812 last 1562796747 Jul 10 15:16:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 15:19:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 15:19:23 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 15:21:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 15:21:10 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 10 15:21:43 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:21:43 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 729 previous similar messages Jul 10 15:25:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 15:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 15:25:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 15:29:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 15:29:32 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 10 15:31:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 15:31:11 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 10 15:31:46 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:31:46 fir-md1-s1 kernel: LustreError: 46567:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 648 previous similar messages Jul 10 15:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 15:35:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 15:36:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 15:39:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 15:39:42 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 10 15:40:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c33dfd3e-93e2-b1e4-c92b-6be01740e2e1 (at 10.9.115.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdbd6000, cur 1562798440 expire 1562798290 last 1562798213 Jul 10 15:40:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 15:41:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 15:41:28 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 10 15:41:48 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:41:48 fir-md1-s1 kernel: LustreError: 46570:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 650 previous similar messages Jul 10 15:45:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 15:45:36 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 10 15:49:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 15:49:45 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 15:51:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 15:51:34 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 15:51:50 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 15:51:50 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 608 previous similar messages Jul 10 15:53:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250d936800, cur 1562799220 expire 1562799070 last 1562798993 Jul 10 15:53:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 15:55:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 15:55:46 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 15:59:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 15:59:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 15:59:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 15:59:58 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 16:01:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 16:01:35 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 10 16:01:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:01:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 16:01:51 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:01:51 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 678 previous similar messages Jul 10 16:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 16:05:57 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 16:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0b6cd9d400, cur 1562800054 expire 1562799904 last 1562799827 Jul 10 16:10:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:10:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 16:10:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 16:10:08 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 16:11:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 16:11:36 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 10 16:11:54 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:11:54 fir-md1-s1 kernel: LustreError: 46537:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 689 previous similar messages Jul 10 16:16:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 16:16:05 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 16:20:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 16:20:09 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 10 16:21:57 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:21:57 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 724 previous similar messages Jul 10 16:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 16:21:58 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 16:22:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 16:26:32 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 16:30:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 16:30:49 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 16:31:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 16:31:59 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 10 16:32:04 fir-md1-s1 kernel: LustreError: 22974:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:32:04 fir-md1-s1 kernel: LustreError: 22974:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 633 previous similar messages Jul 10 16:35:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:35:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 16:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 16:37:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 16:40:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 16:40:52 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 10 16:42:05 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:42:05 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 708 previous similar messages Jul 10 16:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 16:42:07 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 10 16:45:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:45:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 16:47:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 16:47:11 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 10 16:51:04 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802657/real 1562802657] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802664 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 16:51:11 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802664/real 1562802664] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802671 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:51:12 fir-md1-s1 kernel: Lustre: 97661:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2126bc6600 x1631551896846384/t0(0) o36->6b95ce2a-f8e3-6a6f-1394-30bdffccf512@10.9.105.51@o2ib4:17/0 lens 504/2888 e 1 to 0 dl 1562802677 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 16:51:12 fir-md1-s1 kernel: Lustre: 97661:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 579 previous similar messages Jul 10 16:51:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 16:51:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 16:51:18 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802671/real 1562802671] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802678 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:51:25 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802678/real 1562802678] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802685 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:51:32 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802685/real 1562802685] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802692 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:51:46 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802699/real 1562802699] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802706 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:51:46 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 10 16:52:07 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802720/real 1562802720] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802727 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:52:07 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 10 16:52:08 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 16:52:08 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 687 previous similar messages Jul 10 16:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 16:52:10 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 10 16:52:42 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562802755/real 1562802755] req@ffff8f1e43d69500 x1636729669870912/t0(0) o104->fir-MDT0002@10.9.106.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562802762 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 16:52:42 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 10 16:53:31 fir-md1-s1 kernel: LustreError: 21446:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.8@o2ib4) failed to reply to blocking AST (req@ffff8f1e43d69500 x1636729669870912 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2f73dc0d80/0x5d9ee639672d158c lrc: 4/0,0 mode: PR/PR res: [0x2c002c429:0x7:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.106.8@o2ib4 remote: 0x9d2381126b1c50d2 expref: 387 pid: 21677 timeout: 1918013 lvb_type: 0 Jul 10 16:53:31 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.8@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 10 16:53:31 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.106.8@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f73dc0d80/0x5d9ee639672d158c lrc: 3/0,0 mode: PR/PR res: [0x2c002c429:0x7:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.106.8@o2ib4 remote: 0x9d2381126b1c50d2 expref: 388 pid: 21677 timeout: 0 lvb_type: 0 Jul 10 16:53:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dda58b78-27c3-1b63-d778-dfc595795aab (at 10.8.30.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501a99800, cur 1562802820 expire 1562802670 last 1562802593 Jul 10 16:55:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 16:55:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 16:57:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 16:57:14 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 17:01:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 17:01:23 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 10 17:02:09 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 17:02:09 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 738 previous similar messages Jul 10 17:02:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 17:02:21 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 10 17:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 17:07:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 17:08:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 17:08:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 17:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07c1712c-9739-2dce-4883-ed8d604a7bd1 (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251c28d800, cur 1562803728 expire 1562803578 last 1562803501 Jul 10 17:08:48 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 10 17:08:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07c1712c-9739-2dce-4883-ed8d604a7bd1 (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a40bdc800, cur 1562803729 expire 1562803579 last 1562803502 Jul 10 17:08:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 17:11:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 17:11:28 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 17:12:11 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 17:12:11 fir-md1-s1 kernel: LustreError: 21298:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 610 previous similar messages Jul 10 17:12:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 17:12:30 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 10 17:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 17:17:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 17:22:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 17:22:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 17:22:11 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 17:22:11 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 679 previous similar messages Jul 10 17:22:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 59d57404-d19d-2713-ed04-b4a9aba223b9 (at 10.8.25.19@o2ib6) Jul 10 17:22:30 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 17:22:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 17:22:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 17:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 17:27:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 17:32:15 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 17:32:15 fir-md1-s1 kernel: LustreError: 46562:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 652 previous similar messages Jul 10 17:32:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 17:32:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 17:32:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 17:32:43 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 10 17:33:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 10 17:33:04 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 17:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 17:37:47 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 17:42:18 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 17:42:18 fir-md1-s1 kernel: LustreError: 21737:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 577 previous similar messages Jul 10 17:42:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 17:42:46 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 10 17:43:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 17:43:05 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 10 17:47:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 17:47:49 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 17:51:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 17:51:16 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 17:52:25 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 17:52:25 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 607 previous similar messages Jul 10 17:52:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 17:52:47 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 10 17:53:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 17:53:11 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 10 17:58:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 17:58:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 10 18:02:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 18:02:26 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 18:02:30 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 18:02:30 fir-md1-s1 kernel: LustreError: 44037:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 600 previous similar messages Jul 10 18:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 18:02:57 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 10 18:04:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 18:04:06 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 10 18:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 18:09:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 18:12:33 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 18:12:33 fir-md1-s1 kernel: LustreError: 56756:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 587 previous similar messages Jul 10 18:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bb6c1ebe-228f-c2b0-845a-14ae6de0b327 (at 10.8.27.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efec2000, cur 1562807578 expire 1562807428 last 1562807351 Jul 10 18:13:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 18:13:04 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 10 18:13:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 18:13:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 18:14:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 18:14:50 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 18:20:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 18:20:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 18:21:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 214ba30a-c145-16d9-1a66-918c9f83d9e3 (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e37f0e000, cur 1562808070 expire 1562807920 last 1562807843 Jul 10 18:21:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 18:22:36 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 18:22:36 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 622 previous similar messages Jul 10 18:23:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 18:23:10 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 18:24:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 18:24:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 10 18:26:03 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ea5198c00 x1634315021315568/t413200361866(0) o36->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:8/0 lens 488/3152 e 1 to 0 dl 1562808368 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:26:18 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562808366/real 1562808366] req@ffff8f34bdb34b00 x1636729757388096/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562808378 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 18:26:18 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 10 18:26:30 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562808378/real 1562808378] req@ffff8f34bdb34b00 x1636729757388096/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562808390 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 18:26:31 fir-md1-s1 kernel: Lustre: 23664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ed7eb8c00 x1634315021315728/t413200363957(0) o36->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:6/0 lens 488/3152 e 0 to 0 dl 1562808396 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:26:42 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.29.1@o2ib6) failed to reply to blocking AST (req@ffff8f34bdb34b00 x1636729757388096 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1be7fa6c00/0x5d9ee63987a52a40 lrc: 4/0,0 mode: PR/PR res: [0x200029c10:0xe1b:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db68d4bec9 expref: 1561005 pid: 24581 timeout: 1923479 lvb_type: 0 Jul 10 18:26:42 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.29.1@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 10 18:26:42 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1be7fa6c00/0x5d9ee63987a52a40 lrc: 3/0,0 mode: PR/PR res: [0x200029c10:0xe1b:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db68d4bec9 expref: 1561002 pid: 24581 timeout: 0 lvb_type: 0 Jul 10 18:26:42 fir-md1-s1 kernel: LustreError: 21268:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808402 with bad export cookie 6746082289101563363 Jul 10 18:26:43 fir-md1-s1 kernel: LustreError: 20368:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808403 with bad export cookie 6746082289101563363 Jul 10 18:26:43 fir-md1-s1 kernel: LustreError: 20368:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 29 previous similar messages Jul 10 18:26:44 fir-md1-s1 kernel: LustreError: 31003:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808404 with bad export cookie 6746082289101563363 Jul 10 18:26:44 fir-md1-s1 kernel: LustreError: 31003:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 46 previous similar messages Jul 10 18:26:46 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808406 with bad export cookie 6746082289101563363 Jul 10 18:26:46 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 102 previous similar messages Jul 10 18:26:50 fir-md1-s1 kernel: LustreError: 20371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808410 with bad export cookie 6746082289101563363 Jul 10 18:26:50 fir-md1-s1 kernel: LustreError: 20371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 165 previous similar messages Jul 10 18:26:58 fir-md1-s1 kernel: LustreError: 25075:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808418 with bad export cookie 6746082289101563363 Jul 10 18:26:58 fir-md1-s1 kernel: LustreError: 25075:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 381 previous similar messages Jul 10 18:27:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 18:27:06 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 18:27:14 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.1@o2ib6 arrived at 1562808434 with bad export cookie 6746082289101563363 Jul 10 18:27:14 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 810 previous similar messages Jul 10 18:28:12 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808402, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2e9c32b840/0x5d9ee6398d9c6f1f lrc: 3/0,1 mode: --/PW res: [0x200029c10:0xe1b:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23704 timeout: 0 lvb_type: 0 Jul 10 18:28:55 fir-md1-s1 kernel: Lustre: 23617:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f10b4e15400 x1636718262757968/t0(0) o101->1b90433c-235e-7531-cfe6-8ebc9f785a9b@10.9.0.64@o2ib4:0/0 lens 600/3264 e 0 to 0 dl 1562808540 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:29:26 fir-md1-s1 kernel: LNet: Service thread pid 23704 was inactive for 200.23s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 10 18:29:26 fir-md1-s1 kernel: Pid: 23704, comm: mdt02_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 10 18:29:26 fir-md1-s1 kernel: Call Trace: Jul 10 18:29:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 10 18:29:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 10 18:29:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 10 18:29:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 10 18:29:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 10 18:29:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 10 18:29:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 10 18:29:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 10 18:29:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562808566.23704 Jul 10 18:30:00 fir-md1-s1 kernel: LustreError: 23580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808510, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0984291680/0x5d9ee6398e3a7a93 lrc: 3/1,0 mode: --/PR res: [0x200029c10:0xe1b:0x0].0x0 bits 0x13/0x48 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23580 timeout: 0 lvb_type: 0 Jul 10 18:30:12 fir-md1-s1 kernel: LustreError: 21412:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f34f3b1c200 x1636729761310096/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:30:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) reconnecting Jul 10 18:30:14 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 18:30:34 fir-md1-s1 kernel: LustreError: 23601:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f34a2a2c500 x1636729761545232/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:30:37 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f28722c5100 x1631603052224096/t0(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:12/0 lens 488/3152 e 0 to 0 dl 1562808642 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:30:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e7bfccc80/0x5d9ee6394b77d89d lrc: 3/0,0 mode: PR/PR res: [0x20002993f:0x1006:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6257c5f0 expref: 702469 pid: 22005 timeout: 1923701 lvb_type: 0 Jul 10 18:30:54 fir-md1-s1 kernel: LustreError: 23745:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e5dbe8000 x1636729761785872/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:31:19 fir-md1-s1 kernel: Lustre: 23664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ca9e1a400 x1631603052233024/t413200456057(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:24/0 lens 488/3152 e 0 to 0 dl 1562808684 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:31:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2e15625340/0x5d9ee6398bc44d4b lrc: 3/0,0 mode: PR/PR res: [0x20002993f:0x181f:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6af89600 expref: 630078 pid: 23714 timeout: 1923743 lvb_type: 0 Jul 10 18:31:25 fir-md1-s1 kernel: LustreError: 20463:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2fc5c1ce00 x1636729762168304/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:31:37 fir-md1-s1 kernel: LustreError: 26258:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f206bb10c00 x1636729762303536/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:31:42 fir-md1-s1 kernel: LustreError: 21412:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808612, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f11c10fc0/0x5d9ee6398ef5837a lrc: 3/0,1 mode: --/PW res: [0x20002993f:0x1006:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21412 timeout: 0 lvb_type: 0 Jul 10 18:31:50 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ed15e7500 x1631603052238544/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:25/0 lens 480/568 e 0 to 0 dl 1562808715 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:31:51 fir-md1-s1 kernel: LNet: Service thread pid 23580 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 10 18:31:51 fir-md1-s1 kernel: Pid: 23580, comm: mdt00_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 10 18:31:51 fir-md1-s1 kernel: Call Trace: Jul 10 18:31:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 10 18:31:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 10 18:31:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 10 18:31:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 10 18:31:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 10 18:31:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562808711.23580 Jul 10 18:31:54 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2fcdc2c800/0x5d9ee6398c365e65 lrc: 3/0,0 mode: PW/PW res: [0x20002993f:0x1816:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6b219522 expref: 580067 pid: 23745 timeout: 1923774 lvb_type: 0 Jul 10 18:32:06 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1fa46c7500/0x5d9ee6398c0e7aed lrc: 3/0,0 mode: PR/PR res: [0x20002993f:0x181c:0x0].0x0 bits 0x1b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6b129d27 expref: 561388 pid: 24584 timeout: 1923786 lvb_type: 0 Jul 10 18:32:08 fir-md1-s1 kernel: LustreError: 20463:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2fc5c1cb00 x1636729762672288/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:32:24 fir-md1-s1 kernel: LustreError: 23745:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808654, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2aeefe2880/0x5d9ee6398f2a8ca9 lrc: 3/0,1 mode: --/PW res: [0x20002993f:0x181f:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23745 timeout: 0 lvb_type: 0 Jul 10 18:32:36 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 18:32:36 fir-md1-s1 kernel: LustreError: 21290:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 487 previous similar messages Jul 10 18:32:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f14ff3df2c0/0x5d9ee6398c379831 lrc: 3/0,0 mode: PR/PR res: [0x20002993f:0x1816:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6b2217c1 expref: 514659 pid: 23750 timeout: 1923817 lvb_type: 0 Jul 10 18:32:52 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-24), not sending early reply req@ffff8f3416a36c00 x1631603052245472/t413200474285(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:27/0 lens 488/3152 e 0 to 0 dl 1562808777 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:32:54 fir-md1-s1 kernel: LustreError: 20728:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1ea519da00 x1636729763178096/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:33:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Jul 10 18:33:21 fir-md1-s1 kernel: Lustre: Skipped 136 previous similar messages Jul 10 18:33:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1fed675a00/0x5d9ee639851dd129 lrc: 3/0,0 mode: PR/PR res: [0x200029939:0xb6d:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db678b7e95 expref: 448900 pid: 20731 timeout: 1923863 lvb_type: 0 Jul 10 18:33:33 fir-md1-s1 kernel: LNet: Service thread pid 21412 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 10 18:33:33 fir-md1-s1 kernel: Pid: 21412, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 10 18:33:33 fir-md1-s1 kernel: Call Trace: Jul 10 18:33:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 10 18:33:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 10 18:33:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 10 18:33:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 10 18:33:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 10 18:33:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 10 18:33:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 10 18:33:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 10 18:33:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562808813.21412 Jul 10 18:33:38 fir-md1-s1 kernel: LustreError: 20463:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808728, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f298b332ac0/0x5d9ee6398f8f2c8d lrc: 3/0,1 mode: --/PW res: [0x20002993f:0x1816:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20463 timeout: 0 lvb_type: 0 Jul 10 18:33:58 fir-md1-s1 kernel: LNet: Service thread pid 21412 completed after 225.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 10 18:34:01 fir-md1-s1 kernel: LustreError: 23738:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e5dbeb300 x1636729763835648/t0(0) o104->fir-MDT0000@10.8.29.1@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 10 18:34:24 fir-md1-s1 kernel: LustreError: 20728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808774, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f17c11fe9c0/0x5d9ee6398fc64c5c lrc: 3/0,1 mode: --/EX res: [0x200029939:0xb6d:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20728 timeout: 0 lvb_type: 0 Jul 10 18:34:25 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f261fb1d400 x1631603052275760/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:0/0 lens 1776/3288 e 0 to 0 dl 1562808870 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 18:34:25 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 10 18:34:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2fe2c85a00/0x5d9ee6397543f6cc lrc: 3/0,0 mode: PR/PR res: [0x200020f18:0x1303e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6db6308e65a expref: 359372 pid: 23750 timeout: 1923930 lvb_type: 0 Jul 10 18:35:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 18:35:20 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 10 18:35:28 fir-md1-s1 kernel: LNet: Service thread pid 20463 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 10 18:35:28 fir-md1-s1 kernel: Pid: 20463, comm: mdt02_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 10 18:35:28 fir-md1-s1 kernel: Call Trace: Jul 10 18:35:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 10 18:35:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 10 18:35:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 10 18:35:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 10 18:35:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 10 18:35:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 10 18:35:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 10 18:35:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 10 18:35:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562808928.20463 Jul 10 18:36:11 fir-md1-s1 kernel: LNet: Service thread pid 23704 completed after 605.15s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 10 18:36:11 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 10 18:36:32 fir-md1-s1 kernel: LustreError: 24583:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562808902, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1f99dc7500/0x5d9ee639909a6dfa lrc: 3/0,1 mode: --/PW res: [0x20002993f:0x1817:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24583 timeout: 0 lvb_type: 0 Jul 10 18:38:23 fir-md1-s1 kernel: LNet: Service thread pid 24583 was inactive for 200.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 10 18:38:23 fir-md1-s1 kernel: Pid: 24583, comm: mdt01_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 10 18:38:23 fir-md1-s1 kernel: Call Trace: Jul 10 18:38:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 10 18:38:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 10 18:38:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 10 18:38:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 10 18:38:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 10 18:38:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562809103.24583 Jul 10 18:39:04 fir-md1-s1 kernel: LNet: Service thread pid 24583 completed after 241.73s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 10 18:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 18:40:17 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 10 18:42:48 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 18:42:48 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 671 previous similar messages Jul 10 18:43:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 18:43:23 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 10 18:45:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 18:45:22 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 10 18:46:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 18:46:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 18:50:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 18:50:23 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 10 18:52:50 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 18:52:50 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 730 previous similar messages Jul 10 18:53:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 18:53:31 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 10 18:56:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 18:56:13 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 10 18:57:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 18:57:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 19:00:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 19:00:53 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 19:02:54 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:02:54 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 636 previous similar messages Jul 10 19:03:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 19:03:37 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 19:06:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 19:06:21 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 10 19:10:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 19:10:59 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 19:12:56 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:12:56 fir-md1-s1 kernel: LustreError: 46531:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 747 previous similar messages Jul 10 19:13:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 19:13:38 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 10 19:14:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 19:16:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 19:16:25 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 10 19:21:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 19:21:32 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 19:23:08 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:23:08 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 617 previous similar messages Jul 10 19:23:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 19:23:47 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 10 19:24:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 19:24:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 19:26:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 10 19:26:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 19:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 19:31:37 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 10 19:33:10 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:33:10 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 747 previous similar messages Jul 10 19:33:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 19:33:54 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 10 19:36:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 19:36:33 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 19:40:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5282ca62-d94c-33fd-9d61-31ebcd98e0af (at 10.9.116.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148dddc400, cur 1562812823 expire 1562812673 last 1562812596 Jul 10 19:40:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 19:41:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 19:41:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 19:41:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 19:41:53 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 19:43:12 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:43:12 fir-md1-s1 kernel: LustreError: 46524:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 667 previous similar messages Jul 10 19:43:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 19:43:59 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 10 19:46:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 19:46:55 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 19:51:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 19:52:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 19:52:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 19:53:14 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 19:53:14 fir-md1-s1 kernel: LustreError: 46522:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 656 previous similar messages Jul 10 19:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 19:54:06 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 10 19:56:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 19:56:56 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 10 20:01:49 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562814102/real 1562814102] req@ffff8f2ed6939800 x1636729822352496/t0(0) o104->fir-MDT0002@10.9.115.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562814109 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 20:01:49 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 10 20:01:56 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562814109/real 1562814109] req@ffff8f2ed6939800 x1636729822352496/t0(0) o104->fir-MDT0002@10.9.115.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562814116 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 20:01:57 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f1953a700 x1631603054418480/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:2/0 lens 1784/3288 e 1 to 0 dl 1562814122 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 20:01:57 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 10 20:02:03 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562814116/real 1562814116] req@ffff8f2ed6939800 x1636729822352496/t0(0) o104->fir-MDT0002@10.9.115.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562814123 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 20:02:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 10 20:02:09 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 10 20:02:17 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562814130/real 1562814130] req@ffff8f2ed6939800 x1636729822352496/t0(0) o104->fir-MDT0002@10.9.115.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562814137 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 20:02:17 fir-md1-s1 kernel: Lustre: 21003:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 10 20:02:17 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.115.13@o2ib4) failed to reply to blocking AST (req@ffff8f2ed6939800 x1636729822352496 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f0a0118a1c0/0x5d9ee639a3c8f1ab lrc: 4/0,0 mode: PR/PR res: [0x2c002c3a1:0xcc22:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.115.13@o2ib4 remote: 0xb3c53fda679b06b6 expref: 1833 pid: 23555 timeout: 1929219 lvb_type: 0 Jul 10 20:02:17 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.115.13@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 10 20:02:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.115.13@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0a0118a1c0/0x5d9ee639a3c8f1ab lrc: 3/0,0 mode: PR/PR res: [0x2c002c3a1:0xcc22:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.115.13@o2ib4 remote: 0xb3c53fda679b06b6 expref: 1834 pid: 23555 timeout: 0 lvb_type: 0 Jul 10 20:02:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 10 20:03:16 fir-md1-s1 kernel: LustreError: 81718:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:03:16 fir-md1-s1 kernel: LustreError: 81718:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 662 previous similar messages Jul 10 20:04:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 20:04:07 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 10 20:05:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 19195713-2529-2820-f0a1-33d24d172ab7 (at 10.9.115.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a71eab800, cur 1562814319 expire 1562814169 last 1562814092 Jul 10 20:05:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 10 20:07:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 20:07:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 20:08:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 20:08:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 20:12:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 20:12:16 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 10 20:13:19 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:13:19 fir-md1-s1 kernel: LustreError: 46510:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 658 previous similar messages Jul 10 20:14:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 20:14:11 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 10 20:19:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 20:19:16 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 10 20:19:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 20:19:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 20:22:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 20:22:17 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 20:23:21 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:23:21 fir-md1-s1 kernel: LustreError: 46553:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 620 previous similar messages Jul 10 20:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 20:24:20 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 10 20:29:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 20:29:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 20:31:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 20:31:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 20:32:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0a2c93c8-5b84-dccd-112a-6823da10a94a (at 10.9.116.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f07d77b0000, cur 1562815954 expire 1562815804 last 1562815727 Jul 10 20:32:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 20:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 20:32:49 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 10 20:33:24 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:33:24 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 474 previous similar messages Jul 10 20:34:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 20:34:23 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 10 20:39:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 20:39:21 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 10 20:42:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 20:42:36 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 10 20:43:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 20:43:06 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 10 20:43:24 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:43:24 fir-md1-s1 kernel: LustreError: 79335:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 522 previous similar messages Jul 10 20:44:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 20:44:24 fir-md1-s1 kernel: Lustre: Skipped 130 previous similar messages Jul 10 20:49:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 20:49:25 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 10 20:52:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 20:52:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 20:53:26 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 20:53:26 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 555 previous similar messages Jul 10 20:53:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 20:53:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 20:54:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 20:54:25 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 10 21:02:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 21:02:02 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 10 21:03:32 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 21:03:32 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 574 previous similar messages Jul 10 21:04:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 21:04:19 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 21:04:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 21:04:44 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 10 21:05:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 21:05:15 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 10 21:13:48 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 21:13:48 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 590 previous similar messages Jul 10 21:14:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 21:14:09 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 10 21:14:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 21:14:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 21:15:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 21:15:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 21:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 21:15:49 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 10 21:21:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2503a3ec00, cur 1562818910 expire 1562818760 last 1562818683 Jul 10 21:23:50 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 21:23:50 fir-md1-s1 kernel: LustreError: 21544:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 589 previous similar messages Jul 10 21:24:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 21:24:26 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 10 21:26:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 21:26:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 21:26:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 21:26:08 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 10 21:26:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 21:26:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 21:33:53 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 10 21:33:53 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 619 previous similar messages Jul 10 21:34:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 21:34:27 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 21:36:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 21:36:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 21:36:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 21:36:10 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 10 21:38:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 21:43:13 fir-md1-s1 kernel: Lustre: 22007:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562820186/real 1562820186] req@ffff8f162cf13900 x1636729886578256/t0(0) o104->fir-MDT0002@10.8.16.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562820193 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 21:43:18 fir-md1-s1 kernel: Lustre: 23621:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562820191/real 1562820191] req@ffff8f369fa1b600 x1636729886638064/t0(0) o104->fir-MDT0002@10.8.16.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562820198 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 21:43:21 fir-md1-s1 kernel: Lustre: 22006:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d509fb600 x1638719331699472/t0(0) o101->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:26/0 lens 1784/3288 e 1 to 0 dl 1562820206 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 21:43:25 fir-md1-s1 kernel: Lustre: 23621:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562820198/real 1562820198] req@ffff8f369fa1b600 x1636729886638064/t0(0) o104->fir-MDT0002@10.8.16.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562820205 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 21:43:25 fir-md1-s1 kernel: Lustre: 23621:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 10 21:43:26 fir-md1-s1 kernel: Lustre: 21423:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3c16bf4800 x1638711356333600/t0(0) o101->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:1/0 lens 1784/3288 e 1 to 0 dl 1562820211 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 21:43:34 fir-md1-s1 kernel: Lustre: 23617:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562820207/real 1562820207] req@ffff8f0c80f14b00 x1636729886710368/t0(0) o104->fir-MDT0002@10.8.16.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562820214 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 21:43:34 fir-md1-s1 kernel: Lustre: 23617:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 10 21:43:41 fir-md1-s1 kernel: LustreError: 22007:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.16.5@o2ib6) failed to reply to blocking AST (req@ffff8f162cf13900 x1636729886578256 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f1a93b30b40/0x5d9ee639a990f2f5 lrc: 4/0,0 mode: PR/PR res: [0x2c002c3a1:0xc980:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.16.5@o2ib6 remote: 0xe66f7278c9b52327 expref: 3088 pid: 97661 timeout: 1935303 lvb_type: 0 Jul 10 21:43:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.16.5@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 10 21:43:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.16.5@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1a93b30b40/0x5d9ee639a990f2f5 lrc: 3/0,0 mode: PR/PR res: [0x2c002c3a1:0xc980:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.16.5@o2ib6 remote: 0xe66f7278c9b52327 expref: 3089 pid: 97661 timeout: 0 lvb_type: 0 Jul 10 21:44:03 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 21:44:03 fir-md1-s1 kernel: LustreError: 21514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 672 previous similar messages Jul 10 21:44:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 21:44:40 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 10 21:46:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 21:46:20 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 21:46:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 21:46:20 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 21:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1fb1c1bc-a5c2-7639-1248-10341b490c82 (at 10.8.16.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ec9dbc00, cur 1562820397 expire 1562820247 last 1562820170 Jul 10 21:49:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 21:49:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 21:54:08 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 21:54:08 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 588 previous similar messages Jul 10 21:54:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 10 21:54:56 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 10 21:56:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 21:56:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 10 21:56:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 10 21:56:31 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 10 21:58:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fcc7c4000, cur 1562821134 expire 1562820984 last 1562820907 Jul 10 21:58:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 22:00:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 22:00:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 22:04:10 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:04:10 fir-md1-s1 kernel: LustreError: 21683:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 649 previous similar messages Jul 10 22:06:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 22:06:31 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 22:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 22:06:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 22:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 22:06:39 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 10 22:11:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 22:11:02 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 10 22:13:37 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ee0d44e00 x1638719625554688/t0(0) o101->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:12/0 lens 376/1600 e 1 to 0 dl 1562822022 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 22:13:51 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.3@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0640def2c0/0x5d9ee639bdc0fe27 lrc: 3/0,0 mode: PR/PR res: [0x2c002c443:0x17f:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.15.3@o2ib6 remote: 0xc36b1972b7953a19 expref: 204 pid: 23586 timeout: 1937091 lvb_type: 0 Jul 10 22:13:51 fir-md1-s1 kernel: LustreError: 23738:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1e08449800 ns: mdt-fir-MDT0002_UUID lock: ffff8f1460fad7c0/0x5d9ee639bdc0ffb6 lrc: 3/0,0 mode: EX/EX res: [0x2c002c443:0x17f:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x50000000000000 nid: 10.8.15.3@o2ib6 remote: 0xc36b1972b7953a27 expref: 116 pid: 23738 timeout: 0 lvb_type: 3 Jul 10 22:13:51 fir-md1-s1 kernel: Lustre: 23738:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f2ee0d44e00 x1638719625554688/t351778161854(0) o101->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:12/0 lens 376/1568 e 1 to 0 dl 1562822022 ref 1 fl Complete:/0/0 rc -107/-107 Jul 10 22:14:11 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:14:11 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 634 previous similar messages Jul 10 22:15:57 fir-md1-s1 kernel: Lustre: 10502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562822149/real 1562822149] req@ffff8f0db72aef00 x1636729907279168/t0(0) o104->fir-MDT0002@10.9.113.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562822156 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 10 22:15:57 fir-md1-s1 kernel: Lustre: 10502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 10 22:16:04 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562822156/real 1562822156] req@ffff8f1d0705b000 x1636729907279552/t0(0) o104->fir-MDT0002@10.9.113.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562822163 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 22:16:04 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 10 22:16:04 fir-md1-s1 kernel: Lustre: 10198:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e36a3fb00 x1635100579630112/t0(0) o101->81c79b6e-3061-2fda-8521-bc0b462e4ff6@10.9.113.13@o2ib4:9/0 lens 1784/3288 e 1 to 0 dl 1562822169 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 22:16:05 fir-md1-s1 kernel: Lustre: 23708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c20893600 x1635100579630176/t0(0) o101->81c79b6e-3061-2fda-8521-bc0b462e4ff6@10.9.113.13@o2ib4:10/0 lens 1784/3288 e 1 to 0 dl 1562822170 ref 2 fl Interpret:/0/0 rc 0/0 Jul 10 22:16:11 fir-md1-s1 kernel: Lustre: 10502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562822164/real 1562822164] req@ffff8f0db72aef00 x1636729907279168/t0(0) o104->fir-MDT0002@10.9.113.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562822171 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 22:16:11 fir-md1-s1 kernel: Lustre: 10502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 10 22:16:25 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562822178/real 1562822178] req@ffff8f1d0705b000 x1636729907279552/t0(0) o104->fir-MDT0002@10.9.113.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562822185 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 10 22:16:25 fir-md1-s1 kernel: LustreError: 10502:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.113.3@o2ib4) failed to reply to blocking AST (req@ffff8f0db72aef00 x1636729907279168 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f20c7640900/0x5d9ee639b06fbfb3 lrc: 4/0,0 mode: PR/PR res: [0x2c002c3a1:0xcb52:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.113.3@o2ib4 remote: 0xf46aef741591092c expref: 4407 pid: 21482 timeout: 1937267 lvb_type: 0 Jul 10 22:16:25 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.113.3@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 10 22:16:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.9.113.3@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f20c7640900/0x5d9ee639b06fbfb3 lrc: 3/0,0 mode: PR/PR res: [0x2c002c3a1:0xcb52:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.113.3@o2ib4 remote: 0xf46aef741591092c expref: 4408 pid: 21482 timeout: 0 lvb_type: 0 Jul 10 22:16:25 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 10 22:16:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 10 22:16:40 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 22:16:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 22:16:40 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 10 22:18:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 22:18:00 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 10 22:18:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd205000, cur 1562822336 expire 1562822186 last 1562822109 Jul 10 22:24:15 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:24:15 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 639 previous similar messages Jul 10 22:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 22:26:42 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 10 22:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 22:26:42 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 10 22:27:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 22:27:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 10 22:29:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 22:29:01 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 10 22:34:17 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:34:17 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 715 previous similar messages Jul 10 22:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 22:36:48 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 10 22:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 10 22:36:48 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 10 22:39:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 22:39:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 22:39:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 10 22:39:24 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 10 22:44:19 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:44:19 fir-md1-s1 kernel: LustreError: 24213:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 624 previous similar messages Jul 10 22:46:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 22:46:51 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 10 22:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 22:47:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 22:50:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 22:50:03 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 10 22:54:21 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 22:54:21 fir-md1-s1 kernel: LustreError: 21617:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 600 previous similar messages Jul 10 22:56:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 22:56:51 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 22:56:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 22:56:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 22:57:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 22:57:02 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 10 23:00:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 23:00:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 10 23:04:23 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:04:23 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 696 previous similar messages Jul 10 23:07:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 10 23:07:00 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 10 23:07:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 23:07:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 10 23:10:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 23:10:41 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 10 23:12:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:12:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 10 23:14:25 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:14:25 fir-md1-s1 kernel: LustreError: 21708:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 663 previous similar messages Jul 10 23:17:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 23:17:05 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 10 23:17:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 23:17:20 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 10 23:20:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 23:20:47 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 10 23:22:38 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client cc0184b4-423e-d61b-ff8b-e62121180b57 (at 10.9.113.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f153524d400, cur 1562826158 expire 1562826008 last 1562825931 Jul 10 23:22:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:22:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 92ffa420-d747-a973-baf2-68cec64e7e81 (at 10.9.113.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1488f49400, cur 1562826160 expire 1562826010 last 1562825933 Jul 10 23:22:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:23:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 153 seconds. I think it's dead, and I am evicting it. exp ffff8f3ae6b90000, cur 1562826234 expire 1562826084 last 1562826081 Jul 10 23:24:27 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:24:27 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 617 previous similar messages Jul 10 23:25:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client dd74c27b-57ee-efbf-9952-be3ffdfb9c30 (at 10.9.114.4@o2ib4) in 175 seconds. I think it's dead, and I am evicting it. exp ffff8f252eb47400, cur 1562826310 expire 1562826160 last 1562826135 Jul 10 23:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d4242da5-5a9c-4508-f9da-c1e7f36347f4 (at 10.9.114.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252173d000, cur 1562826348 expire 1562826198 last 1562826121 Jul 10 23:25:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d4242da5-5a9c-4508-f9da-c1e7f36347f4 (at 10.9.114.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45182c8800, cur 1562826362 expire 1562826212 last 1562826135 Jul 10 23:26:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:27:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 23:27:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:27:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:27:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 23:27:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 23:27:07 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 10 23:27:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 10 23:27:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 10 23:34:30 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:34:30 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 618 previous similar messages Jul 10 23:36:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 23:36:21 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 10 23:37:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c16ae5000, cur 1562827031 expire 1562826881 last 1562826804 Jul 10 23:37:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 10 23:37:15 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 10 23:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 10 23:37:46 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 10 23:44:34 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:44:34 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 529 previous similar messages Jul 10 23:46:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 10 23:46:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 10 23:47:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 23:47:19 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 10 23:47:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 23:47:47 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 10 23:47:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:47:58 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 10 23:49:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:52:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 10 23:52:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 10 23:54:36 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 10 23:54:36 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 465 previous similar messages Jul 10 23:54:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8f13c09d-70f2-3426-bcdb-b5b12d23066d (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f5bb8000, cur 1562828088 expire 1562827938 last 1562827861 Jul 10 23:55:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8f13c09d-70f2-3426-bcdb-b5b12d23066d (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f27cb400, cur 1562828102 expire 1562827952 last 1562827875 Jul 10 23:55:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 10 23:57:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 10 23:57:02 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 10 23:57:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 10 23:57:19 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 10 23:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 10 23:57:53 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 10 23:58:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 00:04:41 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 00:04:41 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 498 previous similar messages Jul 11 00:07:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 00:07:32 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 00:08:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 11 00:08:00 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 00:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 00:08:15 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 11 00:14:42 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 00:14:42 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 483 previous similar messages Jul 11 00:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 00:17:37 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 11 00:18:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 00:18:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 00:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 00:18:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 00:24:43 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 00:24:43 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 524 previous similar messages Jul 11 00:27:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 00:27:53 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 11 00:28:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 00:28:20 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 11 00:29:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 00:29:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 00:33:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 00:33:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 00:34:46 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 00:34:46 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 557 previous similar messages Jul 11 00:37:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 00:37:56 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 11 00:38:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 00:38:29 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 11 00:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 00:39:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 00:42:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 00:44:47 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 00:44:47 fir-md1-s1 kernel: LustreError: 21485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 549 previous similar messages Jul 11 00:47:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 00:47:56 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 11 00:49:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 00:49:02 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 00:49:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 00:49:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 00:50:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 00:50:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 00:54:49 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 11 00:54:49 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 642 previous similar messages Jul 11 00:58:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 00:58:10 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 01:00:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 01:00:49 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 11 01:00:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:00:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 11 01:04:57 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:04:57 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 607 previous similar messages Jul 11 01:05:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:05:29 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 01:07:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:08:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 01:08:19 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 11 01:09:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:10:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 01:10:54 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 01:11:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:11:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 01:15:00 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:15:00 fir-md1-s1 kernel: LustreError: 46535:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 607 previous similar messages Jul 11 01:15:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:18:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 01:18:28 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 01:20:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 01:20:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 11 01:23:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:23:01 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 01:25:03 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:25:03 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 664 previous similar messages Jul 11 01:28:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 01:28:31 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 01:31:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 01:31:55 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 11 01:32:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:32:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 01:33:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:33:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:33:31 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 01:35:10 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:35:10 fir-md1-s1 kernel: LustreError: 25630:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 549 previous similar messages Jul 11 01:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f198ef57400, cur 1562834151 expire 1562834001 last 1562833924 Jul 11 01:38:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 01:38:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 01:39:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:42:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 01:42:05 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 11 01:42:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:44:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:44:01 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 01:45:12 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:45:12 fir-md1-s1 kernel: LustreError: 22670:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 578 previous similar messages Jul 11 01:47:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:47:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 01:48:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 01:48:58 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 01:52:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 01:52:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 01:54:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 01:54:02 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 01:55:15 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 01:55:15 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 500 previous similar messages Jul 11 01:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 01:59:07 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 11 01:59:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 01:59:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 02:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 02:02:26 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 02:04:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:04:08 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 11 02:05:17 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 02:05:17 fir-md1-s1 kernel: LustreError: 46545:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 500 previous similar messages Jul 11 02:09:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 02:09:11 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 11 02:11:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 02:11:53 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 02:13:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 02:13:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 02:14:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:14:14 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 02:15:25 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 11 02:15:25 fir-md1-s1 kernel: LustreError: 20505:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 507 previous similar messages Jul 11 02:19:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 02:19:15 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 11 02:22:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 02:22:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 02:23:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 02:23:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 02:24:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:24:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 02:25:25 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 02:25:25 fir-md1-s1 kernel: LustreError: 21685:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 639 previous similar messages Jul 11 02:29:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 02:29:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 11 02:33:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 02:33:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 02:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 02:33:24 fir-md1-s1 kernel: Lustre: Skipped 41087 previous similar messages Jul 11 02:35:26 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 02:35:26 fir-md1-s1 kernel: LustreError: 21711:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 627 previous similar messages Jul 11 02:36:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:36:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 02:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24cb033c00, cur 1562837824 expire 1562837674 last 1562837597 Jul 11 02:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 02:39:42 fir-md1-s1 kernel: Lustre: Skipped 41111 previous similar messages Jul 11 02:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 02:43:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 11 02:45:31 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 94208 GRANT, real grant 0 Jul 11 02:45:31 fir-md1-s1 kernel: LustreError: 46542:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 555 previous similar messages Jul 11 02:48:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:48:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 11 02:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 02:49:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 02:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 02:54:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 02:55:39 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 02:55:39 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 535 previous similar messages Jul 11 02:58:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b9620cc9-0642-09f3-d857-9cdbad9511de (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533fe6400, cur 1562839101 expire 1562838951 last 1562838874 Jul 11 02:58:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 02:58:24 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 02:58:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 02:58:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 11 02:59:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 02:59:49 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 03:03:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 03:04:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 03:05:41 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 03:05:41 fir-md1-s1 kernel: LustreError: 22429:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 641 previous similar messages Jul 11 03:05:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:08:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 03:08:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 03:09:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 03:09:56 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 11 03:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:11:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 03:14:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 03:14:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 03:15:44 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 03:15:44 fir-md1-s1 kernel: LustreError: 21292:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 558 previous similar messages Jul 11 03:20:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 11 03:20:02 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 03:20:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 03:20:02 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 03:22:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:22:05 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 03:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 03:24:20 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 03:24:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cac1eba7-cdaa-957f-8735-d5169807717b (at 10.9.112.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c662b5800, cur 1562840685 expire 1562840535 last 1562840458 Jul 11 03:24:45 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 11 03:25:45 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 03:25:45 fir-md1-s1 kernel: LustreError: 46541:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 509 previous similar messages Jul 11 03:30:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 03:30:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 03:30:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 03:30:25 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 11 03:32:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:32:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 03:35:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 03:35:02 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 03:35:52 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 28672 GRANT, real grant 0 Jul 11 03:35:52 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 537 previous similar messages Jul 11 03:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 03:40:25 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 03:42:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 03:42:42 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 11 03:43:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:43:01 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 03:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 03:45:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 03:45:56 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 03:45:56 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 554 previous similar messages Jul 11 03:50:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 03:50:28 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 03:53:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 03:53:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 03:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 03:54:53 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 11 03:55:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 03:55:35 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 03:55:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 03:55:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 504 previous similar messages Jul 11 04:00:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 04:00:39 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 11 04:04:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 74771de5-63d2-ad4d-0853-e29847bc9774 (at 10.9.116.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d1d67c800, cur 1562843064 expire 1562842914 last 1562842837 Jul 11 04:04:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 04:04:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 74771de5-63d2-ad4d-0853-e29847bc9774 (at 10.9.116.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22d6423c00, cur 1562843079 expire 1562842929 last 1562842852 Jul 11 04:04:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 04:05:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 04:05:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 04:05:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 04:05:58 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 391 previous similar messages Jul 11 04:06:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:06:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 04:06:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 04:06:50 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 04:09:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 92a5fc1a-0f67-1260-3d67-1ac1c4c2c6d6 (at 10.8.28.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25215c6400, cur 1562843396 expire 1562843246 last 1562843169 Jul 11 04:10:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 04:10:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 11 04:16:01 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 04:16:01 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 463 previous similar messages Jul 11 04:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 04:16:15 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 04:17:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:17:19 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 04:18:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 04:18:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 04:21:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 04:21:05 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 04:26:08 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 657250be-d5db-acec-954e-1239d7463eca claims 155648 GRANT, real grant 0 Jul 11 04:26:08 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 393 previous similar messages Jul 11 04:27:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 04:27:13 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 04:27:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:27:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 04:29:17 fir-md1-s1 kernel: Lustre: 50444:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bfea3bc00 x1631639284829952/t0(0) o36->e18301fc-f860-0db4-bf24-6c606e0cc839@10.8.8.31@o2ib6:22/0 lens 520/2888 e 1 to 0 dl 1562844562 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:29:24 fir-md1-s1 kernel: Lustre: 22288:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c4ecb4800 x1638201205215568/t0(0) o36->5a22b190-14d6-e96a-6855-6cd3296f5726@10.9.104.72@o2ib4:29/0 lens 568/2888 e 1 to 0 dl 1562844569 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:29:29 fir-md1-s1 kernel: Lustre: 23729:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f44a3b27800 x1634080641904016/t0(0) o36->c6e3bcd8-71de-d683-20ac-e6684b91d659@10.9.108.10@o2ib4:4/0 lens 600/2888 e 0 to 0 dl 1562844574 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:29:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 04:29:32 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 11 04:29:34 fir-md1-s1 kernel: Lustre: 23726:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3fecb89e00 x1631580975217936/t0(0) o36->f070aa79-4085-01c4-e45c-5c90a853bda7@10.9.106.25@o2ib4:9/0 lens 552/2888 e 0 to 0 dl 1562844579 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:29:42 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1daa4fb600 x1631548097618640/t0(0) o36->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:17/0 lens 568/2888 e 1 to 0 dl 1562844587 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:30:04 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1de7e53600 x1634130859459504/t0(0) o36->190e8c90-938d-b7f6-84df-7662b8e78e53@10.9.107.71@o2ib4:9/0 lens 584/2888 e 1 to 0 dl 1562844609 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:30:04 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 11 04:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Jul 11 04:30:32 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844542, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f24ff7aad00/0x5d9ee63a066764a5 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a066764ac expref: -99 pid: 97661 timeout: 0 lvb_type: 0 Jul 11 04:30:34 fir-md1-s1 kernel: LustreError: 23663:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844544, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f3d49bec380/0x5d9ee63a06688438 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a0668843f expref: -99 pid: 23663 timeout: 0 lvb_type: 0 Jul 11 04:30:39 fir-md1-s1 kernel: LustreError: 23587:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844549, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f393cabe300/0x5d9ee63a066ae688 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a066ae68f expref: -99 pid: 23587 timeout: 0 lvb_type: 0 Jul 11 04:30:45 fir-md1-s1 kernel: Lustre: 20728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2308d54b00 x1633774881944864/t0(0) o36->bb635275-94e4-0a1a-209a-677b17ce9a5a@10.9.104.50@o2ib4:20/0 lens 552/2888 e 0 to 0 dl 1562844650 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:30:45 fir-md1-s1 kernel: Lustre: 20728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 11 04:30:57 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844567, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f0a01e12400/0x5d9ee63a067236ad lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a067236b4 expref: -99 pid: 50444 timeout: 0 lvb_type: 0 Jul 11 04:30:57 fir-md1-s1 kernel: LustreError: 50444:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 11 04:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jul 11 04:31:08 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 11 04:31:10 fir-md1-s1 kernel: LustreError: 26258:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844580, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1bf44318c0/0x5d9ee63a0678cfe4 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 14 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a0678cfeb expref: -99 pid: 26258 timeout: 0 lvb_type: 0 Jul 11 04:31:10 fir-md1-s1 kernel: LustreError: 26258:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 11 04:31:20 fir-md1-s1 kernel: LustreError: 20725:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562844589, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f064baa7500/0x5d9ee63a067caf60 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 14 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee63a067caf67 expref: -99 pid: 20725 timeout: 0 lvb_type: 0 Jul 11 04:31:20 fir-md1-s1 kernel: LustreError: 20725:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 11 04:31:25 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2b12e88600 x1638083510157968/t0(0) o36->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:0/0 lens 584/2888 e 0 to 0 dl 1562844690 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:31:25 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 11 04:37:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 04:37:24 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 11 04:37:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:37:58 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 11 04:39:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 04:39:43 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 11 04:41:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 04:41:13 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 11 04:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 04:47:26 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 11 04:48:12 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562845685/real 1562845685] req@ffff8f1bfea3c500 x1636730140589504/t0(0) o104->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562845692 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 04:48:19 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562845692/real 1562845692] req@ffff8f1bfea3c500 x1636730140589504/t0(0) o104->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562845699 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 04:48:20 fir-md1-s1 kernel: Lustre: 21675:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f442ca18c00 x1636296525541360/t0(0) o36->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:25/0 lens 488/3152 e 1 to 0 dl 1562845705 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 04:48:26 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562845699/real 1562845699] req@ffff8f1bfea3c500 x1636730140589504/t0(0) o104->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562845706 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 04:48:40 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562845713/real 1562845713] req@ffff8f1bfea3c500 x1636730140589504/t0(0) o104->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562845720 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 04:48:40 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 04:48:40 fir-md1-s1 kernel: LustreError: 23614:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.112.17@o2ib4) failed to reply to blocking AST (req@ffff8f1bfea3c500 x1636730140589504 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2f04b05a00/0x5d9ee639b06374d1 lrc: 4/0,0 mode: PR/PR res: [0x20002985c:0x475:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.112.17@o2ib4 remote: 0x8809107949661ab4 expref: 28 pid: 23628 timeout: 1960802 lvb_type: 0 Jul 11 04:48:40 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.112.17@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 11 04:48:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.112.17@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f2f04b05a00/0x5d9ee639b06374d1 lrc: 3/0,0 mode: PR/PR res: [0x20002985c:0x475:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.112.17@o2ib4 remote: 0x8809107949661ab4 expref: 29 pid: 23628 timeout: 0 lvb_type: 0 Jul 11 04:48:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:48:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 04:49:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 04:49:53 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 04:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 04:51:28 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 11 04:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae10cc76-adf2-6fa2-11b9-b27d5e4703ab (at 10.9.112.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd379000, cur 1562845904 expire 1562845754 last 1562845677 Jul 11 04:51:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 04:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 04:57:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 04:59:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 04:59:00 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 11 05:01:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 05:01:01 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 11 05:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 05:01:33 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 11 05:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 05:08:12 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 11 05:09:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 05:09:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 05:11:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 05:11:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 05:11:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 05:11:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 05:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 05:18:17 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 05:21:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 05:21:12 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 11 05:21:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 05:21:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 05:21:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 05:21:37 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 05:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 05:28:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 05:31:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 05:31:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 05:31:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 05:31:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 05:31:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 05:31:37 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 11 05:38:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 05:38:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 05:41:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 05:41:40 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 11 05:41:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 05:41:40 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 11 05:42:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 05:42:36 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 05:46:56 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 11 05:46:56 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 11 05:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 05:48:40 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 05:51:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 05:51:57 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 05:51:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 11 05:51:57 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 11 05:55:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 05:55:20 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 05:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 05:58:47 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 06:02:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 06:02:02 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 11 06:04:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0d626ed000, cur 1562850275 expire 1562850125 last 1562850048 Jul 11 06:04:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 06:05:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 06:05:52 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 11 06:07:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 06:07:21 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 06:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 06:08:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 06:12:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 06:12:21 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 11 06:16:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 06:16:10 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 11 06:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 06:19:11 fir-md1-s1 kernel: Lustre: Skipped 445323 previous similar messages Jul 11 06:19:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 06:19:17 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 06:23:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 06:23:22 fir-md1-s1 kernel: Lustre: Skipped 445373 previous similar messages Jul 11 06:25:06 fir-md1-s1 kernel: Lustre: 23572:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562851499/real 1562851499] req@ffff8f0d034ad700 x1636730185487712/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562851506 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 06:25:13 fir-md1-s1 kernel: Lustre: 23572:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562851506/real 1562851506] req@ffff8f0d034ad700 x1636730185487712/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562851513 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 06:25:14 fir-md1-s1 kernel: Lustre: 10588:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f09d5525d00 x1637044331667008/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:19/0 lens 480/568 e 1 to 0 dl 1562851519 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 06:25:20 fir-md1-s1 kernel: Lustre: 23572:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562851513/real 1562851513] req@ffff8f0d034ad700 x1636730185487712/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562851520 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 06:25:20 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f09d5525d00 x1637044331667008/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:19/0 lens 480/536 e 1 to 0 dl 1562851519 ref 1 fl Complete:/0/0 rc 301/301 Jul 11 06:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 06:26:14 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 06:29:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 06:29:28 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 11 06:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 06:29:29 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 11 06:33:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 06:33:33 fir-md1-s1 kernel: Lustre: Skipped 164 previous similar messages Jul 11 06:36:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 06:36:34 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 06:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 06:39:31 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 06:39:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 06:39:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 06:40:32 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e006d8900 x1631353224125856/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 376/1600 e 1 to 0 dl 1562852437 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 06:41:47 fir-md1-s1 kernel: LustreError: 23627:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562852417, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f0c68d100/0x5d9ee63a16136f3f lrc: 3/0,1 mode: --/EX res: [0x200029d48:0x1:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23627 timeout: 0 lvb_type: 0 Jul 11 06:42:17 fir-md1-s1 kernel: Lustre: 23627:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (82:38s); client may timeout. req@ffff8f2e006d8900 x1631353224125856/t413216701507(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 376/1568 e 1 to 0 dl 1562852499 ref 1 fl Complete:/0/0 rc 0/0 Jul 11 06:43:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 06:43:36 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 11 06:47:55 fir-md1-s1 kernel: Lustre: 23708:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562852867/real 1562852867] req@ffff8f0e9f8c2100 x1636730195096224/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562852874 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 06:47:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 06:47:55 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 06:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 06:49:42 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 11 06:52:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 06:52:21 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 11 06:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 06:53:42 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 11 06:58:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 06:58:05 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 11 07:00:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 07:00:29 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 07:02:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:02:55 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 07:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 07:03:45 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 11 07:08:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 07:08:13 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 11 07:10:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 07:10:38 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 07:13:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 07:13:46 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 11 07:14:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:14:04 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 07:18:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 11 07:18:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 07:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 07:20:53 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 07:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 07:24:05 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 11 07:25:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:25:27 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 07:29:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 07:29:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 07:31:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 07:31:15 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 07:34:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 07:34:06 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 11 07:36:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:36:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 07:39:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 11 07:39:22 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 07:41:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 07:41:28 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 07:44:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 07:44:12 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 11 07:47:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:47:13 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 07:50:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 07:50:53 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 11 07:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 07:51:43 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 07:54:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 07:54:26 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 11 07:58:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 07:58:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 08:02:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 08:02:01 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 11 08:02:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 08:02:14 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 08:04:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 08:04:37 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 11 08:08:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 08:08:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 08:12:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 08:12:02 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 11 08:12:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 11 08:12:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 11 08:14:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 08:14:37 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 08:21:35 fir-md1-s1 kernel: Lustre: 23751:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562858488/real 1562858488] req@ffff8f2647e9b600 x1636730239955200/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562858495 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 08:22:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 08:22:49 fir-md1-s1 kernel: Lustre: Skipped 181 previous similar messages Jul 11 08:23:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 08:23:08 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 08:23:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 08:23:34 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 08:24:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 08:24:38 fir-md1-s1 kernel: Lustre: Skipped 218 previous similar messages Jul 11 08:32:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 08:32:52 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 11 08:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 08:34:40 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 11 08:35:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 08:35:15 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 08:35:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 11 08:35:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 08:38:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client aa133483-5248-262d-e748-a147c987e0e5 (at 10.9.108.65@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1506141c00, cur 1562859494 expire 1562859344 last 1562859267 Jul 11 08:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 08:42:59 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 08:44:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 08:44:41 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 11 08:45:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 08:45:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 08:45:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 08:45:55 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 08:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 08:53:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 08:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 08:54:43 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 11 08:55:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 08:55:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 11 08:56:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 08:56:32 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 11 08:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9eb449c2-e54f-1e34-81bc-f024b214ecc1 (at 10.9.114.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18eddc1400, cur 1562860639 expire 1562860489 last 1562860412 Jul 11 08:57:19 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 11 08:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9eb449c2-e54f-1e34-81bc-f024b214ecc1 (at 10.9.114.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4536050400, cur 1562860649 expire 1562860499 last 1562860422 Jul 11 08:57:29 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 11 09:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 09:03:52 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 09:05:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 09:05:05 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 09:07:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 09:07:23 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 09:07:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 09:07:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 11 09:11:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2c74246800, cur 1562861476 expire 1562861326 last 1562861249 Jul 11 09:11:16 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 11 09:14:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 09:14:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 09:15:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 09:15:09 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 11 09:17:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3528ecaf-52ef-a9ab-e1d8-8a0bbcc53063 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f92f8dc00, cur 1562861865 expire 1562861715 last 1562861638 Jul 11 09:17:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3528ecaf-52ef-a9ab-e1d8-8a0bbcc53063 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b7b501800, cur 1562861871 expire 1562861721 last 1562861644 Jul 11 09:17:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 09:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 09:18:14 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 09:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cd3d0230-3738-e2d9-7e9f-2fd94c27579a (at 10.9.115.5@o2ib4) in 160 seconds. I think it's dead, and I am evicting it. exp ffff8f251b635400, cur 1562861941 expire 1562861791 last 1562861781 Jul 11 09:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cd3d0230-3738-e2d9-7e9f-2fd94c27579a (at 10.9.115.5@o2ib4) in 166 seconds. I think it's dead, and I am evicting it. exp ffff8f4518ea2c00, cur 1562861947 expire 1562861797 last 1562861781 Jul 11 09:19:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 09:19:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 09:19:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 09:21:26 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 11 09:21:26 fir-md1-s1 kernel: LustreError: 22430:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 108 previous similar messages Jul 11 09:24:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 09:24:24 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 09:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 09:25:21 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 09:26:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e389cf81-f921-5b21-a2d2-508161f0a482 (at 10.9.114.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e7a75e400, cur 1562862398 expire 1562862248 last 1562862171 Jul 11 09:29:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 09:29:45 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 09:31:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 09:31:34 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 09:34:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 09:34:27 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 09:35:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 09:35:22 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 11 09:41:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 09:41:47 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 09:42:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 09:42:14 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 11 09:44:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 09:44:32 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 09:45:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 09:45:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 11 09:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 683c6245-5f05-50dc-7e48-4fd959186454 (at 10.9.114.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ecf6cdc00, cur 1562863593 expire 1562863443 last 1562863366 Jul 11 09:46:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 09:54:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f12d1bee000, cur 1562864083 expire 1562863933 last 1562863856 Jul 11 09:54:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 09:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 09:54:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 11 09:54:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 11 09:54:54 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 09:55:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 09:55:23 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 09:55:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 09:55:54 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 11 10:04:26 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 11 10:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 10:04:50 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 10:06:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 10:06:05 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 11 10:07:33 fir-md1-s1 kernel: Lustre: 23618:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864845/real 1562864845] req@ffff8f15350b1500 x1636730289756624/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864852 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 10:07:33 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864845/real 1562864845] req@ffff8f0f2d2bbf00 x1636730289756592/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864852 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 10:07:33 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 10:07:40 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864852/real 1562864852] req@ffff8f0cc475a400 x1636730289756576/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864859 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:07:40 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 10:07:40 fir-md1-s1 kernel: Lustre: 23703:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1299bf8600 x1637045642245168/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:15/0 lens 480/568 e 1 to 0 dl 1562864865 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 10:07:47 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864859/real 1562864859] req@ffff8f0f2d2bbf00 x1636730289756592/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864866 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:07:47 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 10:07:54 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864867/real 1562864867] req@ffff8f0f2d2bbf00 x1636730289756592/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864874 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:07:54 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 11 10:08:01 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864874/real 1562864874] req@ffff8f0cc475a400 x1636730289756576/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864881 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:08:01 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 10:08:15 fir-md1-s1 kernel: Lustre: 23618:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864888/real 1562864888] req@ffff8f15350b1500 x1636730289756624/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864895 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:08:15 fir-md1-s1 kernel: Lustre: 23618:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 11 10:08:36 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864909/real 1562864909] req@ffff8f0cc475a400 x1636730289756576/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864916 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:08:36 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562864909/real 1562864909] req@ffff8f0f2d2bbf00 x1636730289756592/t0(0) o106->fir-MDT0000@10.9.112.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562864916 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 10:08:36 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 11 10:08:36 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 10:09:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 88cac62b-9ed1-f52c-09d1-c83e30477915 (at 10.9.112.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f6a8c4c00, cur 1562864941 expire 1562864791 last 1562864714 Jul 11 10:09:01 fir-md1-s1 kernel: Lustre: 23568:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:76s); client may timeout. req@ffff8f0527b35100 x1637045642245184/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:15/0 lens 480/536 e 1 to 0 dl 1562864865 ref 1 fl Complete:/0/0 rc 301/301 Jul 11 10:09:01 fir-md1-s1 kernel: Lustre: 23568:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 11 10:12:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 10:12:12 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 10:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 10:15:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 10:16:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 10:16:29 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 11 10:20:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:20:45 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 10:23:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 10:23:13 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 11 10:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 10:26:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 10:26:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 10:26:33 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 11 10:32:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:33:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 10:33:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 10:35:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:36:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 10:36:31 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 10:36:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 10:36:40 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 11 10:43:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:44:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 10:44:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 11 10:46:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 10:46:41 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 10:46:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 10:46:41 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 11 10:49:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:49:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 10:50:35 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 1522cbeb-4bdd-6d96-7026-321415672330 claims 28672 GRANT, real grant 0 Jul 11 10:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 10:54:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 10:56:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 10:56:44 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 11 10:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 10:57:19 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 10:58:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 10:58:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 11:05:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:05:04 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 11 11:06:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 11:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 11:06:50 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 11 11:07:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 11:07:24 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 11:16:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 11:16:52 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 11 11:17:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:17:13 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 11 11:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 11:17:37 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 11:17:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 11:17:38 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 11:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 11:26:53 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 11 11:27:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:27:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 11:28:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 11:28:15 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 11:33:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 11:33:59 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 11:37:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 11:37:24 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 11:37:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:37:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 11:38:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 11:38:16 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 11:46:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 11:46:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 11:47:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 11:47:28 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 11 11:47:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:47:46 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 11 11:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 11:48:40 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 11:49:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9964d113-fbb3-bb3d-6283-d900df7d14b0 (at 10.9.106.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2502d76c00, cur 1562870941 expire 1562870791 last 1562870714 Jul 11 11:49:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 11:49:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 78d0a867-f444-fefc-cf10-2f40e2381985 (at 10.9.106.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14ebfda400, cur 1562870942 expire 1562870792 last 1562870715 Jul 11 11:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9964d113-fbb3-bb3d-6283-d900df7d14b0 (at 10.9.106.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f2428800, cur 1562870946 expire 1562870796 last 1562870719 Jul 11 11:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 606c5bb8-0e42-25d6-4ebe-304dc77c2b78 (at 10.9.115.4@o2ib4) in 180 seconds. I think it's dead, and I am evicting it. exp ffff8f251fd04800, cur 1562871017 expire 1562870867 last 1562870837 Jul 11 11:50:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 606c5bb8-0e42-25d6-4ebe-304dc77c2b78 (at 10.9.115.4@o2ib4) in 187 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdb37c00, cur 1562871022 expire 1562870872 last 1562870835 Jul 11 11:50:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 11:50:22 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562871015/real 1562871015] req@ffff8f0a06665d00 x1636730350774576/t0(0) o106->fir-MDT0000@10.9.106.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562871022 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 11:50:22 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 11 11:50:30 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b92bf7500 x1637046316264608/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:5/0 lens 480/568 e 1 to 0 dl 1562871035 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 11:50:30 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 11 11:50:36 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562871029/real 1562871029] req@ffff8f0a06665d00 x1636730350774576/t0(0) o106->fir-MDT0000@10.9.106.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562871036 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 11:50:36 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 11:50:57 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562871050/real 1562871050] req@ffff8f0a06665d00 x1636730350774576/t0(0) o106->fir-MDT0000@10.9.106.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562871057 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 11:50:57 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 11 11:51:32 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562871085/real 1562871085] req@ffff8f0a06665d00 x1636730350774576/t0(0) o106->fir-MDT0000@10.9.106.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562871092 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 11:51:32 fir-md1-s1 kernel: Lustre: 10506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 11 11:51:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b482f036-7d17-2da6-47f4-65a7cfc97276 (at 10.9.115.6@o2ib4) in 201 seconds. I think it's dead, and I am evicting it. exp ffff8f10e1d1e800, cur 1562871093 expire 1562870943 last 1562870892 Jul 11 11:51:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b66b5931-c739-bfc8-7870-ebb3b0803c4b (at 10.9.115.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f07dd387000, cur 1562871119 expire 1562870969 last 1562870892 Jul 11 11:51:59 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 11 11:52:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8cb89dc0-c88e-79a3-15bf-ba0c55574ada (at 10.9.106.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b06f800, cur 1562871176 expire 1562871026 last 1562870949 Jul 11 11:52:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 11 11:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2c00b5b0-7c71-de91-4f51-5cb7b8de22c7 (at 10.9.112.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d43af8400, cur 1562871337 expire 1562871187 last 1562871110 Jul 11 11:57:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 11:57:36 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 11 11:57:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 11:57:49 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 11 11:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 11:58:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 12:02:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2d5099ae-445e-cb63-4a68-05c28b456049 (at 10.9.112.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d6ce4c000, cur 1562871767 expire 1562871617 last 1562871540 Jul 11 12:02:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 11 12:07:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ae2db849-228e-9fc3-9658-c094e0066e91 (at 10.9.102.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14c7771000, cur 1562872047 expire 1562871897 last 1562871820 Jul 11 12:07:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 11 12:07:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 12:07:37 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 11 12:07:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:07:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 12:08:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 12:08:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 12:09:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 12:09:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 12:10:01 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872194/real 1562872194] req@ffff8f0b6d0ec200 x1636730360125824/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872201 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 12:10:09 fir-md1-s1 kernel: Lustre: 23556:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ed8ed7200 x1638778776139328/t0(0) o101->61f27ed9-3774-ff36-a4d4-c75cfa800da4@10.9.113.14@o2ib4:14/0 lens 1784/3288 e 1 to 0 dl 1562872214 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 12:10:09 fir-md1-s1 kernel: Lustre: 97651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872202/real 1562872202] req@ffff8f1967e24500 x1636730360127776/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872209 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 12:10:09 fir-md1-s1 kernel: Lustre: 97651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 11 12:10:29 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872222/real 1562872222] req@ffff8f0b6d0ec200 x1636730360125824/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872229 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 12:10:29 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 11 12:10:58 fir-md1-s1 kernel: Lustre: 10198:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0964ab8300 x1634132492115616/t0(0) o101->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:3/0 lens 1784/3288 e 1 to 0 dl 1562872263 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 12:10:58 fir-md1-s1 kernel: Lustre: 10198:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 11 12:11:02 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872255/real 1562872255] req@ffff8f13d1dc1b00 x1636730360218672/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872262 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 12:11:02 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages Jul 11 12:11:06 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1455254b00 x1638788830172848/t0(0) o101->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:11/0 lens 1784/3288 e 0 to 0 dl 1562872271 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 12:11:06 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 11 12:11:08 fir-md1-s1 kernel: Lustre: 23584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2a7df04500 x1638779383139440/t0(0) o101->927ebcad-3373-a003-8433-ef313bb0111b@10.8.15.9@o2ib6:13/0 lens 1784/3288 e 0 to 0 dl 1562872273 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 12:11:08 fir-md1-s1 kernel: Lustre: 23584:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 11 12:12:07 fir-md1-s1 kernel: Lustre: 23716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872320/real 1562872320] req@ffff8f2efdecaa00 x1636730360221808/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872327 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 12:12:07 fir-md1-s1 kernel: Lustre: 20722:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872320/real 1562872320] req@ffff8f1e85cff800 x1636730360221616/t0(0) o104->fir-MDT0002@10.8.28.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562872327 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 12:12:07 fir-md1-s1 kernel: Lustre: 20722:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 138 previous similar messages Jul 11 12:12:07 fir-md1-s1 kernel: Lustre: 23716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 12:12:28 fir-md1-s1 kernel: LustreError: 23651:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.28.1@o2ib6) failed to reply to blocking AST (req@ffff8f0b6d0ec200 x1636730360125824 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2ef68aa640/0x5d9ee63a3c50834b lrc: 4/0,0 mode: PR/PR res: [0x2c002c3a1:0xde1a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.28.1@o2ib6 remote: 0xe501a40d47942476 expref: 2646 pid: 23601 timeout: 1987550 lvb_type: 0 Jul 11 12:12:28 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.28.1@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 11 12:12:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.28.1@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ef68aa640/0x5d9ee63a3c50834b lrc: 3/0,0 mode: PR/PR res: [0x2c002c3a1:0xde1a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.28.1@o2ib6 remote: 0xe501a40d47942476 expref: 2647 pid: 23601 timeout: 0 lvb_type: 0 Jul 11 12:12:28 fir-md1-s1 kernel: Lustre: 23757:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (104:1s); client may timeout. req@ffff8f2eb226cb00 x1638778791357328/t351854447442(0) o101->9623626c-0b75-9f88-dbc1-9e0f1a45143d@10.9.114.4@o2ib4:3/0 lens 1784/1240 e 1 to 0 dl 1562872347 ref 1 fl Complete:/0/0 rc 0/0 Jul 11 12:12:28 fir-md1-s1 kernel: Lustre: 23757:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 11 12:17:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 12:17:45 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 11 12:18:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 12:18:22 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 12:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 12:19:08 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 11 12:19:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:19:25 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872758/real 1562872758] req@ffff8f14a62aa400 x1636730364103360/t0(0) o106->fir-MDT0002@10.9.103.12@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562872765 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 12:19:25 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562872758/real 1562872758] req@ffff8f0c72e6a400 x1636730364103376/t0(0) o106->fir-MDT0002@10.9.103.12@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562872765 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 12:19:25 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 47 previous similar messages Jul 11 12:19:25 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 11 12:19:43 fir-md1-s1 kernel: Lustre: 10501:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f12443f3c00 x1637046440268528/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:18/0 lens 480/568 e 0 to 0 dl 1562872788 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 12:22:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b21cb12d-36f3-6903-28db-2805bc9f940b (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f050405a400, cur 1562872928 expire 1562872778 last 1562872701 Jul 11 12:22:08 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 11 12:22:25 fir-md1-s1 kernel: Lustre: 10588:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (178:9s); client may timeout. req@ffff8f12443f3c00 x1637046440268528/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:18/0 lens 480/536 e 0 to 0 dl 1562872936 ref 1 fl Complete:/0/0 rc 301/301 Jul 11 12:22:25 fir-md1-s1 kernel: Lustre: 10588:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 11 12:23:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:24:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 12:27:47 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 11 12:28:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 12:28:24 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 11 12:29:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:29:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 12:29:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 12:36:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:37:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 12:37:50 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 11 12:39:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:39:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 11 12:39:14 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 11 12:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 12:39:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 12:47:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 12:47:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 12:47:54 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 11 12:49:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 12:49:31 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 12:49:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 12:49:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 12:58:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 12:58:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 13:00:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:00:06 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 11 13:00:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 13:00:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 13:03:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 13:03:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 13:08:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 13:08:18 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 13:10:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:10:09 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 13:10:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 13:10:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 13:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e6224c00, cur 1562876190 expire 1562876040 last 1562875963 Jul 11 13:16:30 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 11 13:16:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 13:16:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 13:19:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 13:19:08 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 11 13:20:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:20:10 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 13:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 13:20:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 13:29:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 13:29:28 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 11 13:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5a80badc-02de-d0da-16f2-dd5cc4f34700 (at 10.9.113.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d3560a800, cur 1562876982 expire 1562876832 last 1562876755 Jul 11 13:30:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:30:10 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 11 13:30:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 13:30:32 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 13:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 13:30:34 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 13:39:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 13:39:28 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 11 13:40:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:40:17 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 13:40:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 13:40:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 11 13:43:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 13:49:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 13:49:36 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 11 13:50:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 13:50:19 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 11 13:50:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 13:50:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 11 13:59:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 13:59:39 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 14:00:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:00:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 14:00:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 14:00:59 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 14:05:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:05:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 14:08:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:10:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 14:10:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 14:10:01 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 14:10:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 14:11:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 14:12:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:12:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 11 14:20:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 14:20:25 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 14:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25004d2800, cur 1562880029 expire 1562879879 last 1562879802 Jul 11 14:20:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 14:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 14:21:22 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 14:22:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:22:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 14:23:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:31:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 14:31:07 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 11 14:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 14:31:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 14:32:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:32:22 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 11 14:37:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:37:56 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 11 14:39:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 14:41:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 14:41:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 14:41:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 14:41:33 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 14:46:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:46:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 11 14:48:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:48:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 14:51:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 14:51:48 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 14:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 14:51:58 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 14:56:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 14:56:55 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 14:58:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 14:58:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 15:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 15:01:49 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 11 15:02:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 15:02:15 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 15:06:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 15:06:55 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 15:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 15:12:03 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 11 15:12:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 15:12:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 15:13:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 15:13:29 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 15:17:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 15:17:22 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 15:22:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 15:22:04 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 11 15:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 15:23:00 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 15:26:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 15:26:55 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 15:28:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 15:28:16 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 11 15:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 15:32:07 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 11 15:33:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 15:33:21 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 15:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 15:38:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 15:38:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 15:38:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 11 15:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 15:42:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 11 15:43:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 15:43:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 11 15:48:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 15:48:47 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 11 15:49:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 15:49:20 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 15:52:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 15:52:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 11 15:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 15:53:42 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 15:58:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 15:58:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 11 16:02:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 16:02:31 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 11 16:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 16:03:48 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 16:06:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 16:06:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 16:08:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 16:08:54 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 11 16:12:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 16:12:56 fir-md1-s1 kernel: Lustre: Skipped 127 previous similar messages Jul 11 16:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 16:13:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 16:18:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 16:18:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 16:19:54 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562887187/real 1562887187] req@ffff8f0769bee600 x1636730486625216/t0(0) o104->fir-MDT0002@10.8.8.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562887194 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 16:19:54 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 75 previous similar messages Jul 11 16:20:02 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06ff261e00 x1631602292001840/t0(0) o101->8c191431-c80e-a99c-d724-6274df7fd787@10.9.102.10@o2ib4:7/0 lens 1792/3288 e 1 to 0 dl 1562887207 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 16:20:02 fir-md1-s1 kernel: Lustre: 23594:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 11 16:20:03 fir-md1-s1 kernel: Lustre: 23756:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26518a6f00 x1631567502066816/t0(0) o101->dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0@10.9.101.62@o2ib4:8/0 lens 576/3264 e 1 to 0 dl 1562887208 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 16:20:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 16:20:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 11 16:20:22 fir-md1-s1 kernel: LustreError: 23651:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.8.22@o2ib6) failed to reply to blocking AST (req@ffff8f0769bee600 x1636730486625216 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2ee5d85e80/0x5d9ee63a9becb427 lrc: 4/0,0 mode: PR/PR res: [0x2c0000404:0x479:0x0].0x0 bits 0x13/0x0 rrc: 44 type: IBT flags: 0x60200400000020 nid: 10.8.8.22@o2ib6 remote: 0xadd31f436d5659b6 expref: 24 pid: 97668 timeout: 2002304 lvb_type: 0 Jul 11 16:20:22 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.8.22@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 11 16:20:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.8.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ee5d85e80/0x5d9ee63a9becb427 lrc: 3/0,0 mode: PR/PR res: [0x2c0000404:0x479:0x0].0x0 bits 0x13/0x0 rrc: 44 type: IBT flags: 0x60200400000020 nid: 10.8.8.22@o2ib6 remote: 0xadd31f436d5659b6 expref: 25 pid: 97668 timeout: 0 lvb_type: 0 Jul 11 16:22:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c0afed87-894c-bd68-b6a7-ca4f7af5df99 (at 10.9.103.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1478a93800, cur 1562887356 expire 1562887206 last 1562887129 Jul 11 16:22:44 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562887357/real 1562887357] req@ffff8f0f6dfed400 x1636730487568944/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562887364 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 16:22:44 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 11 16:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 16:22:57 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 11 16:23:02 fir-md1-s1 kernel: Lustre: 23573:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0fec7ac200 x1637048159181168/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:7/0 lens 480/568 e 0 to 0 dl 1562887387 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 16:23:02 fir-md1-s1 kernel: Lustre: 23573:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 11 16:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 05c0106b-58e3-3894-4263-1c25034da8ce (at 10.8.15.6@o2ib6) in 178 seconds. I think it's dead, and I am evicting it. exp ffff8f2253775c00, cur 1562887432 expire 1562887282 last 1562887254 Jul 11 16:23:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 11 16:23:52 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:45s); client may timeout. req@ffff8f0fec7ac200 x1637048159181168/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:7/0 lens 480/536 e 0 to 0 dl 1562887387 ref 1 fl Complete:/0/0 rc 301/301 Jul 11 16:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 16:24:20 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 11 16:24:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ada255a7-6ae1-daa7-1ada-5fa3d62ccfb9 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d18712c00, cur 1562887481 expire 1562887331 last 1562887254 Jul 11 16:24:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 16:25:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5afbb881-550f-fa08-cafd-4158b37c9811 (at 10.8.24.16@o2ib6) in 198 seconds. I think it's dead, and I am evicting it. exp ffff8f34eb3aa800, cur 1562887557 expire 1562887407 last 1562887359 Jul 11 16:26:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ac9cd631-a534-1fba-753c-5069b079d1ad (at 10.8.24.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f450560b400, cur 1562887586 expire 1562887436 last 1562887359 Jul 11 16:30:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 16:30:09 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 16:30:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 16:30:33 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 16:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 16:33:26 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 11 16:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 16:34:30 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 16:42:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 16:42:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 11 16:42:54 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 16:42:54 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1369 previous similar messages Jul 11 16:43:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 16:43:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 16:43:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 16:43:35 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 16:44:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 16:44:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 16:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c8996d000, cur 1562888693 expire 1562888543 last 1562888466 Jul 11 16:44:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 16:52:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 16:52:13 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 16:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 16:53:46 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 11 16:54:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 16:54:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 16:54:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 16:54:53 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 17:03:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 17:03:01 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 17:03:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 17:03:46 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 11 17:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 17:05:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 17:06:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 17:06:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 17:13:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 17:13:04 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 17:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 17:13:46 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 11 17:15:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 17:15:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 17:17:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 17:17:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 17:18:35 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:20:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:22:00 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:23:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 17:23:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 17:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 17:24:07 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 11 17:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 17:25:29 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 11 17:28:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 17:28:44 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 11 17:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 17:34:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 11 17:34:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 17:34:13 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 17:35:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 17:35:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 17:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 17:40:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 17:43:43 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:43:43 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 11 17:45:18 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:45:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 24ab177c-fa53-1ad7-a4b8-75ee3a88aec0 (at 10.8.8.24@o2ib6) Jul 11 17:45:25 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 17:45:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 17:45:30 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 17:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 17:46:37 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 17:46:38 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:51:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 17:51:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 17:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 17:55:27 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 11 17:55:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 17:55:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 17:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 17:57:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 17:57:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 17:58:10 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 18:01:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 18:01:07 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 18:05:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 18:05:30 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 11 18:05:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:05:51 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 18:06:03 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 18:07:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 18:07:43 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 18:09:23 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 18:12:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 18:12:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 18:13:48 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 18:15:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 18:15:34 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 11 18:15:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:15:51 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 11 18:17:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 18:17:49 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 11 18:23:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 18:23:19 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 18:25:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 18:25:37 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 11 18:25:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:25:51 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 18:28:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 18:28:05 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 18:34:24 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 18:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 18:36:04 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 18:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 18:38:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:38:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 18:38:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 11 18:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 18:40:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 18:46:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 18:46:12 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 11 18:48:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:48:27 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 11 18:49:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 18:49:02 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 18:51:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 18:51:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 18:56:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 18:56:50 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 11 18:58:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 18:58:50 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 18:59:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 18:59:12 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 19:04:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 19:04:44 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 11 19:06:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 19:06:56 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 11 19:09:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 19:09:08 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 11 19:09:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 19:09:38 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 11 19:17:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 19:17:44 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 11 19:19:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 19:19:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 19:19:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 19:19:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 19:19:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 19:19:44 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 11 19:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 19:27:59 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 11 19:29:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 19:29:18 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 11 19:29:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 19:29:46 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 11 19:30:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 19:30:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 19:38:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 19:38:11 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 11 19:39:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 19:39:19 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 11 19:40:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 19:40:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 19:43:12 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 19:43:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 19:43:32 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 19:44:47 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 19:48:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 19:48:20 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 11 19:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 11 19:50:19 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 11 19:51:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 19:51:41 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 19:52:12 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 19:55:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 19:55:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 19:55:33 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:33 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 55 previous similar messages Jul 11 19:55:34 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:34 fir-md1-s1 kernel: Lustre: 10196:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 11 19:55:35 fir-md1-s1 kernel: Lustre: 23568:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:35 fir-md1-s1 kernel: Lustre: 23568:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 11 19:55:37 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:37 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 11 19:55:41 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:41 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 75 previous similar messages Jul 11 19:55:49 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:55:49 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 105 previous similar messages Jul 11 19:56:05 fir-md1-s1 kernel: Lustre: 23568:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:56:05 fir-md1-s1 kernel: Lustre: 23568:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 321 previous similar messages Jul 11 19:56:38 fir-md1-s1 kernel: Lustre: 23591:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:56:38 fir-md1-s1 kernel: Lustre: 23591:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1194 previous similar messages Jul 11 19:57:42 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 19:57:42 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 63707 previous similar messages Jul 11 19:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 19:58:47 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 19:59:37 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:00:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 20:00:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 20:01:47 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:01:47 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 11 20:04:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 11 20:04:22 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 20:06:34 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:08:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bcba60800, cur 1562900886 expire 1562900736 last 1562900659 Jul 11 20:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 20:08:48 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 11 20:10:36 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:10:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 545f12c1-4799-a254-b9c4-f75f43e1bc5b (at 10.8.27.23@o2ib6) reconnecting Jul 11 20:10:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 20:12:27 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:14:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 20:14:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 11 20:15:06 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 11 20:15:06 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 34215 previous similar messages Jul 11 20:15:42 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:15:42 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 11 20:15:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 20:15:48 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 11 20:18:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 20:18:53 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 11 20:20:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 20:20:44 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 11 20:22:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:22:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 11 20:26:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 20:26:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 20:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 20:29:07 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 11 20:30:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 92201019-2a0e-37b3-944e-b91d23afff01 (at 10.8.17.26@o2ib6) reconnecting Jul 11 20:30:53 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 11 20:31:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 20:31:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 11 20:31:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:31:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 11 20:36:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 20:36:16 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 11 20:39:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 20:39:19 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 20:41:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 20:41:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 20:44:20 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 20:44:20 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 7 previous similar messages Jul 11 20:46:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 20:46:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 11 20:46:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 20:46:30 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 11 20:49:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 20:49:32 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 11 20:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b4dc4310-abd3-57a8-960f-a27b33e667d3 (at 10.8.27.7@o2ib6) reconnecting Jul 11 20:51:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 20:57:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 20:57:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 20:57:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 20:57:48 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 11 20:59:38 fir-md1-s1 kernel: Lustre: 21411:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562903971/real 1562903971] req@ffff8f109fecec00 x1636730701721232/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562903978 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 20:59:38 fir-md1-s1 kernel: Lustre: 21411:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Jul 11 20:59:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 20:59:42 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 20:59:46 fir-md1-s1 kernel: Lustre: 23576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0fb5fe4500 x1637050144562656/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1562903991 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 20:59:59 fir-md1-s1 kernel: Lustre: 20571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562903992/real 1562903992] req@ffff8f0bbd201200 x1636730701721200/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562903999 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 11 20:59:59 fir-md1-s1 kernel: Lustre: 20571:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 11 21:00:12 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2e868b0c00 x1638729918041728/t0(0) o101->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:17/0 lens 1800/3288 e 0 to 0 dl 1562904017 ref 2 fl Interpret:/0/0 rc 0/0 Jul 11 21:00:12 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 11 21:00:22 fir-md1-s1 kernel: LustreError: 23716:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.54@o2ib4) failed to reply to blocking AST (req@ffff8f2661a6ec00 x1636730702142368 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f24f8a82400/0x5d9ee63b0f89a974 lrc: 4/0,0 mode: PR/PR res: [0x2c002c32e:0x413:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0x1fbe402371cceb58 expref: 1619 pid: 97644 timeout: 2019104 lvb_type: 0 Jul 11 21:00:22 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.54@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 11 21:00:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.106.54@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f24f8a82400/0x5d9ee63b0f89a974 lrc: 4/0,0 mode: PR/PR res: [0x2c002c32e:0x413:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0x1fbe402371cceb58 expref: 1620 pid: 97644 timeout: 0 lvb_type: 0 Jul 11 21:00:22 fir-md1-s1 kernel: Lustre: 21411:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:31s); client may timeout. req@ffff8f0f37cd7b00 x1637050144562816/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:21/0 lens 480/536 e 1 to 0 dl 1562903991 ref 1 fl Complete:/0/0 rc 301/301 Jul 11 21:00:22 fir-md1-s1 kernel: Lustre: 21411:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 11 21:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 21:01:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 11 21:03:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d101a6e2-e864-769d-b612-f06b470f1e70 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22bdf56800, cur 1562904186 expire 1562904036 last 1562903959 Jul 11 21:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 21:08:06 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 11 21:09:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 21:09:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 21:10:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 21:10:19 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 11 21:11:04 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 21:11:04 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 11 21:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 21:11:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 21:12:16 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:12:46 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:12:52 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:12:52 fir-md1-s1 kernel: LustreError: 22650:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 11 21:12:57 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:12:57 fir-md1-s1 kernel: LustreError: 20506:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 11 21:13:02 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:13:02 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 11 21:13:12 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:13:12 fir-md1-s1 kernel: LustreError: 46557:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 11 21:13:32 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:13:32 fir-md1-s1 kernel: LustreError: 46523:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14 previous similar messages Jul 11 21:14:07 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:14:07 fir-md1-s1 kernel: LustreError: 21740:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 11 21:18:29 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:18:29 fir-md1-s1 kernel: LustreError: 46538:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 54 previous similar messages Jul 11 21:19:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 21:21:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 21:21:00 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 21:21:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 21:21:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 11 21:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 21:21:52 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 11 21:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 21:31:09 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 11 21:31:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 21:31:25 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 11 21:32:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 21:32:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 21:36:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 21:36:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 21:41:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 21:41:23 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 11 21:41:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 21:41:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 11 21:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 21:42:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 21:50:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 21:52:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 21:52:34 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 21:52:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 21:52:34 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 11 21:52:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 21:52:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 11 21:54:48 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:54:48 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3 previous similar messages Jul 11 21:55:25 fir-md1-s1 kernel: LustreError: 21793:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:55:59 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:55:59 fir-md1-s1 kernel: LustreError: 21709:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1 previous similar message Jul 11 21:57:32 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 21:57:32 fir-md1-s1 kernel: LustreError: 22431:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2 previous similar messages Jul 11 22:00:05 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 32768 GRANT, real grant 0 Jul 11 22:00:05 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5 previous similar messages Jul 11 22:01:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d8e287a-76f1-2fbc-54c1-19b634c62b63 (at 10.8.24.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ee4acc00, cur 1562907687 expire 1562907537 last 1562907460 Jul 11 22:01:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 11 22:02:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:02:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 11 22:02:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 22:02:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 22:02:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 22:02:47 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 11 22:03:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 11 22:03:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 11 22:05:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 22:05:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 11 22:12:36 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 22:12:36 fir-md1-s1 kernel: LustreError: 21682:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 12 previous similar messages Jul 11 22:12:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 22:12:46 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 11 22:13:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 22:13:00 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 11 22:13:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:13:54 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 11 22:18:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 22:21:37 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 22:21:37 fir-md1-s1 kernel: LustreError: 44034:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 17 previous similar messages Jul 11 22:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 22:23:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 22:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 22:23:09 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 11 22:24:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:24:10 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 11 22:26:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30c5ff7c00, cur 1562909186 expire 1562909036 last 1562908959 Jul 11 22:26:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 11 22:29:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 22:31:43 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 22:31:43 fir-md1-s1 kernel: LustreError: 21715:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 11 22:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 22:33:11 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 11 22:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 11 22:33:11 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 11 22:34:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:34:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 11 22:42:03 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 32768 GRANT, real grant 0 Jul 11 22:42:03 fir-md1-s1 kernel: LustreError: 21364:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 11 22:43:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 22:43:18 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 11 22:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 22:43:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 22:44:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:44:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 11 22:52:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 65863bd2-5bf4-3857-2c85-73178bef5ac4 (at 10.9.103.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f4040c00, cur 1562910731 expire 1562910581 last 1562910504 Jul 11 22:52:20 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 22:52:20 fir-md1-s1 kernel: LustreError: 46512:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 11 22:52:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 22:52:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 11 22:53:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 639ee15e-2da6-9d93-315b-2c6ce5340bd5 (at 10.8.26.2@o2ib6) Jul 11 22:53:18 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 11 22:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 22:53:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 11 22:54:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 22:54:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 11 22:57:56 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562911069/real 1562911069] req@ffff8f30d62d4e00 x1636730844511680/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562911076 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 11 22:57:56 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Jul 11 23:02:32 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 23:02:32 fir-md1-s1 kernel: LustreError: 22432:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 11 23:03:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 11 23:03:26 fir-md1-s1 kernel: Lustre: Skipped 163845 previous similar messages Jul 11 23:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 11 23:03:59 fir-md1-s1 kernel: Lustre: Skipped 163803 previous similar messages Jul 11 23:04:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 23:04:53 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 11 23:13:04 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 23:13:04 fir-md1-s1 kernel: LustreError: 20508:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 11 23:13:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 23:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 23:13:45 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 11 23:14:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 23:14:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 23:15:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 23:15:02 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 11 23:21:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 23:23:52 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 23:23:52 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 24 previous similar messages Jul 11 23:24:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 11 23:24:02 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 11 23:24:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 11 23:24:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 11 23:25:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 23:25:09 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 11 23:32:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 23:34:10 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 23:34:10 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 20 previous similar messages Jul 11 23:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 23:34:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 11 23:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 23:34:30 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 11 23:34:42 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 11 23:35:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 23:35:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 11 23:44:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 11 23:44:37 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 11 23:44:40 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 32768 GRANT, real grant 0 Jul 11 23:44:40 fir-md1-s1 kernel: LustreError: 22059:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 11 23:44:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 11 23:44:49 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 11 23:45:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 11 23:45:16 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 11 23:46:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 11 23:54:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 11 23:54:48 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 11 23:54:58 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 11 23:54:58 fir-md1-s1 kernel: LustreError: 46514:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 11 23:55:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 11 23:55:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 11 23:55:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 11 23:55:21 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 11 23:59:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:01:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:03:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:04:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 00:04:51 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 12 00:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 00:05:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 00:05:06 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 00:05:06 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 00:05:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:05:30 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 00:10:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:11:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:11:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:13:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:15:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 00:15:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 00:15:13 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 12 00:15:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 00:15:24 fir-md1-s1 kernel: LustreError: 46516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 00:15:24 fir-md1-s1 kernel: LustreError: 46516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 00:15:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:15:33 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 12 00:18:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:19:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31dcb3e800, cur 1562915977 expire 1562915827 last 1562915750 Jul 12 00:19:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 00:20:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:25:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 00:25:20 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 12 00:25:32 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 00:25:32 fir-md1-s1 kernel: LustreError: 46560:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 21 previous similar messages Jul 12 00:25:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:25:34 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 00:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 00:25:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 00:33:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:35:44 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 32768 GRANT, real grant 0 Jul 12 00:35:44 fir-md1-s1 kernel: LustreError: 46584:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 00:35:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 00:35:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 00:35:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 00:35:48 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 12 00:36:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:36:07 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 00:37:04 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 00:41:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f231aefd000, cur 1562917303 expire 1562917153 last 1562917076 Jul 12 00:42:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:43:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:45:51 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 00:45:51 fir-md1-s1 kernel: LustreError: 46559:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 22 previous similar messages Jul 12 00:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 00:46:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 00:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 00:46:30 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 00:48:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:48:20 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 00:50:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 00:56:14 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 00:56:14 fir-md1-s1 kernel: LustreError: 25631:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 00:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 00:57:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 12 00:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 00:57:53 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 12 00:59:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 00:59:39 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 01:00:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:06:22 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 01:06:22 fir-md1-s1 kernel: LustreError: 23093:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 01:07:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:07:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 01:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 01:08:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 01:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 01:08:12 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 12 01:10:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 01:10:29 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 01:16:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:16:39 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli ec935c16-6a63-f875-145b-2db5feba3892 claims 28672 GRANT, real grant 0 Jul 12 01:16:39 fir-md1-s1 kernel: LustreError: 21735:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Jul 12 01:18:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 01:18:25 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 01:18:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 01:18:25 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 12 01:20:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 01:20:48 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 12 01:29:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 01:29:01 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 01:29:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 01:29:48 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 01:30:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:30:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 01:30:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 01:30:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 01:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 01:39:29 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 01:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 01:40:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:40:58 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 12 01:40:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 01:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 01:41:26 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 01:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 01:49:35 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 01:51:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 01:51:03 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 01:52:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 01:52:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 01:59:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 01:59:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 01:59:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 01:59:36 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 02:01:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 02:01:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 02:03:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:03:27 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 02:09:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 02:09:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 02:11:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 02:11:40 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 02:12:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 02:12:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 02:14:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:14:39 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 12 02:19:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 02:19:45 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 12 02:21:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 02:21:45 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 02:23:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 02:23:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 02:25:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:25:21 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 02:30:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 02:30:29 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 12 02:31:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f0fb7d800, cur 1562923893 expire 1562923743 last 1562923666 Jul 12 02:31:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 02:31:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 02:34:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 02:34:50 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 12 02:35:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:35:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 02:40:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 02:40:34 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 02:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 02:43:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 02:45:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:45:32 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 02:46:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 02:46:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 02:50:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 02:50:40 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 12 02:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 02:54:15 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 02:56:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 02:56:30 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 02:58:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 02:58:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 03:01:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 03:01:02 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 12 03:04:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 03:04:35 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 03:06:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 03:06:36 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 03:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 03:11:12 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 03:14:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 03:14:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 03:17:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 03:17:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 03:19:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:19:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 03:21:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 03:21:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:21:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 03:21:14 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 12 03:24:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 03:24:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 03:25:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:25:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 03:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 03:27:19 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 12 03:31:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:31:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 03:31:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 03:31:26 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 12 03:35:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 03:35:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 03:41:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 03:41:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 03:41:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 03:41:54 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 03:44:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:44:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 03:45:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 03:45:56 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 03:51:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 03:51:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 03:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 03:51:56 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 12 03:54:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 03:54:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 03:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 03:55:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 04:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 04:02:13 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 12 04:03:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 04:03:06 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 04:07:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 04:07:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 04:08:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 04:09:29 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929762/real 1562929762] req@ffff8f07bf2e1800 x1636731016960144/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929769 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 04:09:29 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 12 04:09:36 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929769/real 1562929769] req@ffff8f1533c1aa00 x1636731016960160/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929776 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:09:37 fir-md1-s1 kernel: Lustre: 23591:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f148c86bc00 x1637052831020032/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1562929782 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 04:09:43 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929776/real 1562929776] req@ffff8f1533c1aa00 x1636731016960160/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929783 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:09:43 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 12 04:09:50 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929783/real 1562929783] req@ffff8f07bf2e1800 x1636731016960144/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929790 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:09:50 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 12 04:09:57 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929790/real 1562929790] req@ffff8f1533c1aa00 x1636731016960160/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929797 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:09:57 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 12 04:10:11 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929804/real 1562929804] req@ffff8f07bf2e1800 x1636731016960144/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929811 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:10:11 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 12 04:10:32 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929825/real 1562929825] req@ffff8f1533c1aa00 x1636731016960160/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929832 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:10:32 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 12 04:11:14 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929867/real 1562929867] req@ffff8f07bf2e1800 x1636731016960144/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929874 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:11:14 fir-md1-s1 kernel: Lustre: 23649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Jul 12 04:12:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 04:12:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 04:12:31 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562929944/real 1562929944] req@ffff8f1533c1aa00 x1636731016960160/t0(0) o106->fir-MDT0002@10.9.103.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562929951 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 04:12:31 fir-md1-s1 kernel: Lustre: 25681:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages Jul 12 04:12:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d5f6cf15-2331-01d2-988c-2d20adf007a2 (at 10.9.103.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522e51800, cur 1562929959 expire 1562929809 last 1562929732 Jul 12 04:12:39 fir-md1-s1 kernel: Lustre: 23649:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (83:114s); client may timeout. req@ffff8f148c86bc00 x1637052831020032/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:12/0 lens 480/536 e 1 to 0 dl 1562929845 ref 1 fl Complete:/0/0 rc 301/301 Jul 12 04:12:39 fir-md1-s1 kernel: Lustre: 23649:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 12 04:14:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 04:14:38 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 04:17:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 04:17:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 12 04:22:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 04:22:29 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 12 04:24:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 04:24:33 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 12 04:24:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 04:24:41 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 04:28:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 04:28:08 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 12 04:32:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 04:32:33 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 12 04:34:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 04:34:47 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 12 04:38:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 04:38:15 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 04:38:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 04:38:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 12 04:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 04:43:05 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 12 04:45:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 04:45:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 04:48:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 04:48:18 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 04:53:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 04:53:09 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 04:56:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 04:56:40 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 04:57:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 04:57:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 04:58:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 04:58:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 05:03:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 05:03:40 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 05:07:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 05:07:23 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 05:08:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 05:08:42 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 05:11:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 05:11:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 05:13:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 05:13:41 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 12 05:18:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 05:18:39 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 05:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 05:19:42 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 05:23:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 05:23:46 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 12 05:29:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 05:29:00 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 05:30:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 05:30:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 05:33:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 05:33:47 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 12 05:39:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 05:39:41 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 12 05:40:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 05:40:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 05:40:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 05:40:40 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 12 05:43:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 05:43:58 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 12 05:44:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 05:49:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 05:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 05:50:19 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 05:50:57 fir-md1-s1 kernel: Lustre: 21312:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562935850/real 1562935850] req@ffff8f0dbc6cbf00 x1636731058646896/t0(0) o106->fir-MDT0002@10.9.103.22@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562935857 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 05:50:57 fir-md1-s1 kernel: Lustre: 21312:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 12 05:51:05 fir-md1-s1 kernel: Lustre: 23605:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0736e85d00 x1637053330878608/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:10/0 lens 480/568 e 1 to 0 dl 1562935870 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 05:51:05 fir-md1-s1 kernel: Lustre: 23605:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 12 05:51:18 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562935871/real 1562935871] req@ffff8f073eb56c00 x1636731058646928/t0(0) o106->fir-MDT0002@10.9.103.22@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562935878 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 05:51:18 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 12 05:52:00 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562935913/real 1562935913] req@ffff8f073eb56900 x1636731058646912/t0(0) o106->fir-MDT0002@10.9.103.22@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1562935920 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 05:52:00 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages Jul 12 05:52:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 05:52:08 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 05:53:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd387039-9e5c-0c5b-0227-8087faaf7a40 (at 10.9.103.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4536596000, cur 1562935988 expire 1562935838 last 1562935761 Jul 12 05:53:08 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 12 05:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 05:55:03 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 12 06:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 06:01:07 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 06:03:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:03:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 06:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 06:05:05 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 06:11:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:11:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 06:11:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 06:11:47 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 06:13:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:13:59 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 06:15:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 06:15:06 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 06:16:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 06:22:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 06:24:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:24:55 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 06:25:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 06:25:06 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 06:33:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 06:33:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 06:34:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:34:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 06:35:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 06:35:06 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 06:37:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:41:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:42:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:43:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 06:43:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 06:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 06:45:20 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 12 06:47:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:47:15 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 06:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 06:53:25 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 12 06:54:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 06:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 06:55:21 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 12 06:58:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 06:58:32 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 12 07:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:02:39 fir-md1-s1 kernel: Lustre: 23563:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 12 07:03:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 07:03:27 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 07:04:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 07:05:27 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 12 07:05:29 fir-md1-s1 kernel: Lustre: 23591:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 12 07:09:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 07:09:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 07:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 07:13:54 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 07:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 07:15:45 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 07:16:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f37a4d77000, cur 1562940968 expire 1562940818 last 1562940741 Jul 12 07:16:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 07:20:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 07:20:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 07:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 07:24:06 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 12 07:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 07:26:02 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 12 07:26:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:27:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:28:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:29:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:30:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:32:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:32:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 07:32:27 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 12 07:34:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 07:34:14 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 07:36:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 07:36:43 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 12 07:39:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:39:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 07:42:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 07:42:31 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 07:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 07:44:23 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 07:44:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:44:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 07:46:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 07:46:48 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 12 07:54:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 07:54:25 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 07:55:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 07:55:27 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 12 07:56:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 07:56:19 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 12 07:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 07:57:05 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 08:04:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 08:04:41 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 08:07:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 08:07:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 08:07:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 08:07:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 08:14:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 08:14:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 08:16:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 08:16:56 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 08:17:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 08:17:33 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 08:17:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 08:17:33 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 12 08:17:56 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562944669/real 1562944669] req@ffff8f2ec4477200 x1636731118387008/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562944676 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 08:17:56 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 27 previous similar messages Jul 12 08:18:04 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e22d5aa00 x1633658408530896/t0(0) o36->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:9/0 lens 496/448 e 1 to 0 dl 1562944689 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 08:18:04 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 161 previous similar messages Jul 12 08:18:05 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ca9e7f500 x1631702352577056/t0(0) o101->2d384d58-fd4c-f6d6-342b-6f9f296484e1@10.9.101.46@o2ib4:10/0 lens 1768/0 e 1 to 0 dl 1562944690 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 08:18:05 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 348 previous similar messages Jul 12 08:18:06 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e89247500 x1631636431594032/t0(0) o101->0d8fe43d-85f9-8061-e5fc-2e0ec8fbd940@10.8.7.11@o2ib6:11/0 lens 576/0 e 1 to 0 dl 1562944691 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 08:18:06 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 212 previous similar messages Jul 12 08:18:08 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3239eaf800 x1638092360824656/t0(0) o101->95c23571-6ded-28b5-8b2e-63d85e709c23@10.8.15.4@o2ib6:13/0 lens 1768/0 e 1 to 0 dl 1562944693 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 08:18:08 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 682 previous similar messages Jul 12 08:18:10 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562944683/real 1562944683] req@ffff8f2ec4477200 x1636731118387008/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562944690 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 08:18:10 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 12 08:18:12 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27c342fb00 x1636577844268464/t0(0) o101->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:17/0 lens 576/0 e 1 to 0 dl 1562944697 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 08:18:12 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 296 previous similar messages Jul 12 08:18:20 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2993ae6000 x1638080775806352/t0(0) o101->c0496bb5-bb8d-8fb8-13d2-918f029a4d08@10.8.26.34@o2ib6:25/0 lens 576/0 e 1 to 0 dl 1562944705 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 08:18:20 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 453 previous similar messages Jul 12 08:18:24 fir-md1-s1 kernel: LustreError: 23607:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.7@o2ib6) failed to reply to blocking AST (req@ffff8f2ec3c80f00 x1636731118397200 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f348cc0cc80/0x5d9ee63a22157135 lrc: 4/0,0 mode: PR/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 882 type: IBT flags: 0x60200400000020 nid: 10.8.27.7@o2ib6 remote: 0x99e7546245a2f947 expref: 375 pid: 23741 timeout: 2059786 lvb_type: 0 Jul 12 08:18:24 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.7@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 12 08:18:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.27.7@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f348cc0cc80/0x5d9ee63a22157135 lrc: 3/0,0 mode: PR/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 882 type: IBT flags: 0x60200400000020 nid: 10.8.27.7@o2ib6 remote: 0x99e7546245a2f947 expref: 376 pid: 23741 timeout: 0 lvb_type: 0 Jul 12 08:18:31 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562944704/real 1562944704] req@ffff8f2ec4477200 x1636731118387008/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562944711 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 08:18:31 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 12 08:18:36 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2d99175a00 x1638731879984176/t0(0) o101->159ddaf1-ce95-3830-127f-4856eec7f12f@10.9.116.1@o2ib4:11/0 lens 576/0 e 0 to 0 dl 1562944721 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 12 08:18:36 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2456 previous similar messages Jul 12 08:19:08 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f32346bc800 x1631556903864304/t0(0) o101->2faef2d8-dc67-f384-07b6-111f344194c1@10.9.101.65@o2ib4:13/0 lens 576/0 e 0 to 0 dl 1562944753 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 12 08:19:08 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3432 previous similar messages Jul 12 08:19:13 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562944746/real 1562944746] req@ffff8f2ec4477200 x1636731118387008/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562944753 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 08:19:13 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 12 08:19:19 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562944669, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f236c869200/0x5d9ee63ba86a12ae lrc: 3/1,0 mode: --/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97646 timeout: 0 lvb_type: 0 Jul 12 08:19:19 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Jul 12 08:19:20 fir-md1-s1 kernel: LustreError: 23699:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562944670, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3c80b5f2c0/0x5d9ee63ba86a58e6 lrc: 3/1,0 mode: --/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23699 timeout: 0 lvb_type: 0 Jul 12 08:19:20 fir-md1-s1 kernel: LustreError: 23699:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 238 previous similar messages Jul 12 08:19:22 fir-md1-s1 kernel: LustreError: 10196:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562944672, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f13b7a4bcc0/0x5d9ee63ba86ac948 lrc: 3/1,0 mode: --/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10196 timeout: 0 lvb_type: 0 Jul 12 08:19:22 fir-md1-s1 kernel: LustreError: 10196:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 64 previous similar messages Jul 12 08:19:26 fir-md1-s1 kernel: LustreError: 21420:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562944676, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f05083c4ec0/0x5d9ee63ba86b9a6f lrc: 3/1,0 mode: --/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21420 timeout: 0 lvb_type: 0 Jul 12 08:19:26 fir-md1-s1 kernel: LustreError: 21420:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 105 previous similar messages Jul 12 08:20:12 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f263f9cc200 x1631563254427968/t0(0) o101->1b1a33fc-473d-0f6c-9f25-a44e13708af4@10.8.8.3@o2ib6:17/0 lens 576/0 e 0 to 0 dl 1562944817 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 12 08:20:12 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 105533 previous similar messages Jul 12 08:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9c359101-0ed1-475a-f1ab-59f22d57209c (at 10.8.22.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ed7a1800, cur 1562944821 expire 1562944671 last 1562944594 Jul 12 08:20:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5990ca21-7371-f423-fd1a-20751dbd1238 (at 10.9.103.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15360d7000, cur 1562944822 expire 1562944672 last 1562944595 Jul 12 08:20:22 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 12 08:20:22 fir-md1-s1 kernel: Lustre: 23589:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:115s); client may timeout. req@ffff8f090e628f00 x1637053928455488/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:27/0 lens 616/0 e 0 to 0 dl 1562944707 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 12 08:20:22 fir-md1-s1 kernel: LustreError: 23736:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.104.22@o2ib4: deadline 30:1s ago req@ffff8f3e90db2700 x1631571371738656/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:21/0 lens 576/0 e 0 to 0 dl 1562944821 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 12 08:20:22 fir-md1-s1 kernel: LustreError: 23736:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 12 08:20:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 08:20:22 fir-md1-s1 kernel: Lustre: 23589:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 93557 previous similar messages Jul 12 08:20:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 12 08:24:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 08:24:52 fir-md1-s1 kernel: Lustre: Skipped 20771 previous similar messages Jul 12 08:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a68d9e6e-5a78-ad1d-4abc-986d23128d99 (at 10.9.113.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ae3559000, cur 1562945193 expire 1562945043 last 1562944966 Jul 12 08:26:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 08:28:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 08:28:36 fir-md1-s1 kernel: Lustre: Skipped 20806 previous similar messages Jul 12 08:28:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 08:28:49 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 08:30:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 08:30:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 08:33:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 66aa3c20-db51-1ee2-67da-24de875c7f64 (at 10.9.113.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a79848000, cur 1562945628 expire 1562945478 last 1562945401 Jul 12 08:33:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 08:34:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 08:34:54 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 08:36:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 08:36:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 08:38:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 08:38:42 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 12 08:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 08:38:52 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 08:40:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ef54d74-26c7-dd87-45f7-921d9e4ba654 (at 10.9.115.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e96392400, cur 1562946046 expire 1562945896 last 1562945819 Jul 12 08:40:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 08:45:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 08:45:06 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 08:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c052722f-fb7b-9d40-a2d8-d22451dc2117 (at 10.9.115.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22a6103000, cur 1562946471 expire 1562946321 last 1562946244 Jul 12 08:47:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 08:48:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2b7395d9-c40c-f531-147e-33ca0a08dcda (at 10.8.22.12@o2ib6) Jul 12 08:48:46 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 12 08:48:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 08:48:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 08:49:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 08:49:17 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 08:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 08:55:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 08:58:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 08:58:50 fir-md1-s1 kernel: Lustre: Skipped 136 previous similar messages Jul 12 08:59:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 12 08:59:22 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 09:01:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 09:01:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 09:05:22 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562947511/real 1562947511] req@ffff8f1299d1a100 x1636731136281312/t0(0) o104->fir-MDT0000@10.9.115.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562947522 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 09:05:22 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 12 09:05:26 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c5e9f3600 x1631559599111984/t0(0) o101->bb17aca1-57d8-f36a-a79b-bcdcd36ec002@10.8.18.20@o2ib6:1/0 lens 576/3264 e 1 to 0 dl 1562947531 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 09:05:26 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3630 previous similar messages Jul 12 09:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bb17aca1-57d8-f36a-a79b-bcdcd36ec002 (at 10.8.18.20@o2ib6) reconnecting Jul 12 09:05:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 09:05:33 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562947522/real 1562947522] req@ffff8f1299d1a100 x1636731136281312/t0(0) o104->fir-MDT0000@10.9.115.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562947533 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 09:05:42 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23edf04800 x1634116848491408/t0(0) o101->2aa758e4-fe35-42c9-321f-e6d541fd5bfd@10.8.27.17@o2ib6:17/0 lens 576/0 e 1 to 0 dl 1562947547 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 12 09:05:42 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2566 previous similar messages Jul 12 09:05:55 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562947544/real 1562947544] req@ffff8f1299d1a100 x1636731136281312/t0(0) o104->fir-MDT0000@10.9.115.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562947555 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 09:05:55 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 12 09:06:14 fir-md1-s1 kernel: Lustre: 20464:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f319fcfdd00 x1631567667024064/t0(0) o101->2dd7454a-4666-cb77-2a9b-10ada81c5a76@10.8.18.27@o2ib6:19/0 lens 576/0 e 0 to 0 dl 1562947579 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 12 09:06:14 fir-md1-s1 kernel: Lustre: 20464:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 40577 previous similar messages Jul 12 09:06:39 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562947588/real 1562947588] req@ffff8f1299d1a100 x1636731136281312/t0(0) o104->fir-MDT0000@10.9.115.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1562947599 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 09:06:39 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 12 09:06:41 fir-md1-s1 kernel: LustreError: 20952:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562947511, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f15bdafee40/0x5d9ee63bad350cbb lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20952 timeout: 0 lvb_type: 0 Jul 12 09:06:41 fir-md1-s1 kernel: LustreError: 20952:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 24 previous similar messages Jul 12 09:06:42 fir-md1-s1 kernel: LustreError: 23648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562947512, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0b0d06d340/0x5d9ee63bad353b84 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23648 timeout: 0 lvb_type: 0 Jul 12 09:06:42 fir-md1-s1 kernel: LustreError: 23648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 244 previous similar messages Jul 12 09:06:44 fir-md1-s1 kernel: LustreError: 21369:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562947514, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f205e2772c0/0x5d9ee63bad35416c lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21369 timeout: 0 lvb_type: 0 Jul 12 09:06:44 fir-md1-s1 kernel: LustreError: 21369:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 45 previous similar messages Jul 12 09:06:48 fir-md1-s1 kernel: LustreError: 21145:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562947518, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3fa133b600/0x5d9ee63bad354c86 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21145 timeout: 0 lvb_type: 0 Jul 12 09:06:48 fir-md1-s1 kernel: LustreError: 21145:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 70 previous similar messages Jul 12 09:07:18 fir-md1-s1 kernel: Lustre: 23622:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2eea337500 x1631571215696624/t0(0) o101->3ef17f0c-d35b-8428-c1da-c84a40a8bdbc@10.9.101.71@o2ib4:23/0 lens 576/0 e 0 to 0 dl 1562947643 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 12 09:07:18 fir-md1-s1 kernel: Lustre: 23622:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 102499 previous similar messages Jul 12 09:07:45 fir-md1-s1 kernel: LustreError: 25676:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.115.10@o2ib4) failed to reply to blocking AST (req@ffff8f1299d1a100 x1636731136281312 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1b321e8000/0x5d9ee63baa823441 lrc: 4/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.9.115.10@o2ib4 remote: 0x9dc480fd8d5c83c4 expref: 12 pid: 20720 timeout: 2062863 lvb_type: 0 Jul 12 09:07:45 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.115.10@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 12 09:07:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.115.10@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1b321e8000/0x5d9ee63baa823441 lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.9.115.10@o2ib4 remote: 0x9dc480fd8d5c83c4 expref: 13 pid: 20720 timeout: 0 lvb_type: 0 Jul 12 09:07:45 fir-md1-s1 kernel: Lustre: 23730:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:123s); client may timeout. req@ffff8f4048976000 x1631541592774480/t0(0) o101->5735cd86-3a30-362c-bc05-c634d3fa1859@10.9.107.11@o2ib4:12/0 lens 576/0 e 1 to 0 dl 1562947542 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 12 09:07:45 fir-md1-s1 kernel: LustreError: 23637:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.4@o2ib4: deadline 100:22s ago req@ffff8f1d7818e600 x1638869204205952/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1562947643 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 12 09:07:45 fir-md1-s1 kernel: LustreError: 23637:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 77 previous similar messages Jul 12 09:07:45 fir-md1-s1 kernel: Lustre: 23730:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 92935 previous similar messages Jul 12 09:07:45 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 12 09:07:45 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 12 09:08:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dc3fcf2e-e5b5-2903-ff5f-2681ca61121a (at 10.9.115.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39cbea9800, cur 1562947703 expire 1562947553 last 1562947476 Jul 12 09:08:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 09:08:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 09:08:56 fir-md1-s1 kernel: Lustre: Skipped 26772 previous similar messages Jul 12 09:09:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 09:09:24 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 09:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 09:15:33 fir-md1-s1 kernel: Lustre: Skipped 26731 previous similar messages Jul 12 09:19:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 09:19:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 09:19:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 09:19:03 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 12 09:20:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 09:20:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 09:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 09:26:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 09:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 09:29:13 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 12 09:29:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 09:29:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 09:30:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 09:30:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 09:36:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 09:36:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 09:39:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 09:39:22 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 12 09:40:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 09:40:48 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 09:41:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 09:41:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 09:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 09:47:17 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 09:49:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 09:49:30 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 12 09:51:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 09:51:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 09:51:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 09:51:56 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 09:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 09:57:32 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 12 09:58:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 507fe63e-2eba-dc91-49a6-94f7b912a620 (at 10.9.112.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4502cd5c00, cur 1562950709 expire 1562950559 last 1562950482 Jul 12 09:58:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 12 09:59:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 09:59:31 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 12 10:02:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 10:02:36 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 10:06:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 10:06:08 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 12 10:08:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:08:20 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 10:09:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 10:09:38 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 10:12:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 10:12:37 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 12 10:18:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 10:18:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 10:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:18:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 10:19:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 10:19:53 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 12 10:22:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 10:22:40 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 10:28:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:28:27 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 10:30:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 10:30:04 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 12 10:33:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 10:33:58 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 10:38:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:38:35 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 10:40:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 10:40:09 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 10:41:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d2e199c6-98a7-0717-8652-10bd2bf787b1 (at 10.8.24.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2530a40400, cur 1562953269 expire 1562953119 last 1562953042 Jul 12 10:41:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 10:44:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 10:44:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 12 10:48:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:48:49 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 10:49:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 10:50:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 10:50:22 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 10:54:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 10:56:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 10:56:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 10:58:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 10:58:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 11:00:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:00:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 11:00:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 11:00:27 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 11:06:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 11:06:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 11:08:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:08:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 11:08:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2f8cf2ac-3786-7722-1124-7c8b6ba37f05 (at 10.9.112.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23163fd400, cur 1562954899 expire 1562954749 last 1562954672 Jul 12 11:08:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 11:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 11:09:13 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 11:10:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 11:10:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 12 11:16:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 11:16:48 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 11:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 11:19:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 12 11:19:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:19:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:19:28 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 12 11:20:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 11:20:49 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 11:27:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 11:27:02 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 11:30:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 11:30:13 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 11:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 11:31:03 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 12 11:32:40 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956353/real 1562956353] req@ffff8f10a7f30000 x1636731199047520/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956360 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 11:32:40 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 12 11:32:48 fir-md1-s1 kernel: Lustre: 23629:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ad0ec1500 x1637054700457712/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:23/0 lens 480/568 e 1 to 0 dl 1562956373 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 11:32:48 fir-md1-s1 kernel: Lustre: 23629:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4553 previous similar messages Jul 12 11:32:54 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956367/real 1562956367] req@ffff8f10a7f30000 x1636731199047520/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956374 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 11:32:54 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 12 11:33:15 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956388/real 1562956388] req@ffff8f10a7f30000 x1636731199047520/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956395 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 11:33:15 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 12 11:33:57 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956430/real 1562956430] req@ffff8f10a7f30000 x1636731199047520/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956437 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 11:33:57 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 12 11:35:14 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956507/real 1562956507] req@ffff8f10a7f30000 x1636731199047520/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956514 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 11:35:14 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Jul 12 11:35:51 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2a234d0c-a8ab-8feb-25f1-bf6554cceb02 (at 10.9.106.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f153524c000, cur 1562956551 expire 1562956401 last 1562956324 Jul 12 11:35:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 11:35:53 fir-md1-s1 kernel: LNet: Service thread pid 23589 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 12 11:35:53 fir-md1-s1 kernel: Pid: 23589, comm: mdt00_071 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 12 11:35:53 fir-md1-s1 kernel: Call Trace: Jul 12 11:35:53 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 12 11:35:53 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 12 11:35:53 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 12 11:35:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 12 11:35:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 12 11:35:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 12 11:35:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 12 11:35:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 12 11:35:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1562956553.23589 Jul 12 11:35:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 88249aca-f8a5-51dd-af36-5041bca337b5 (at 10.8.16.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522963400, cur 1562956557 expire 1562956407 last 1562956330 Jul 12 11:35:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 11:35:57 fir-md1-s1 kernel: LNet: Service thread pid 23589 completed after 203.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 12 11:35:57 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 12 11:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 400d6bb2-cc30-d980-7d8b-e0cf4a3a30a0 (at 10.9.107.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148c8f3800, cur 1562956561 expire 1562956411 last 1562956334 Jul 12 11:36:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 11:37:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 11:37:41 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 11:37:45 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0cde604b00 x1637054711613040/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 480/568 e 0 to 0 dl 1562956670 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 11:37:48 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562956661/real 1562956661] req@ffff8f127ceb2d00 x1636731200892032/t0(0) o106->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1562956668 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 11:37:48 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Jul 12 11:40:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:40:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 11:40:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 11:40:16 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 11:40:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client be42b497-ab1b-8d58-3101-014aad577cfc (at 10.8.27.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f8ad6000, cur 1562956829 expire 1562956679 last 1562956602 Jul 12 11:40:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 11:41:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 11:41:04 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 12 11:41:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:41:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 11:44:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:44:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 11:48:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 11:48:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 11:50:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 11:50:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 11:51:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 11:51:06 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 12 11:51:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 11:51:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 11:53:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 62b59a8a-bc87-45e0-45ad-94363e33396b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f330b675000, cur 1562957588 expire 1562957438 last 1562957361 Jul 12 11:53:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 11:58:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 11:58:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 12:01:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 12:01:11 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 12 12:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 12:01:42 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 12 12:02:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 12:02:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 12:10:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 12:10:20 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 12:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 19effcd6-8030-8ae1-d9d6-24266f7c8d3c (at 10.8.27.35@o2ib6) Jul 12 12:11:19 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 12 12:11:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dc10f947-e401-3136-94f5-752e472b9896 (at 10.9.103.10@o2ib4) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f1478a02400, cur 1562958693 expire 1562958543 last 1562958486 Jul 12 12:11:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 12:11:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 12:11:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 12:14:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 12:14:46 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 12 12:16:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 054d1548-cdcc-8b5b-1ec4-5ec77e76503f (at 10.8.12.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25013e4400, cur 1562959002 expire 1562958852 last 1562958775 Jul 12 12:16:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 12 12:21:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 12:21:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 12:21:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 12:21:32 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 12 12:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 12:21:54 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 12 12:23:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dce47606-d438-ab63-01f6-1079880f0e28 (at 10.8.17.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a2a1bcc00, cur 1562959394 expire 1562959244 last 1562959167 Jul 12 12:23:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 12:29:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 12:29:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 12:31:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 84fd8c4b-6545-cd41-282d-ef5f651cba30 (at 10.8.17.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2529f98c00, cur 1562959916 expire 1562959766 last 1562959689 Jul 12 12:31:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 12:31:56 fir-md1-s1 kernel: LustreError: 55142:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f07828d8000 x1636731232215392/t0(0) o105->fir-MDT0002@10.8.17.11@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 12 12:31:56 fir-md1-s1 kernel: LustreError: 55142:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 12 12:32:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 12:32:08 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 12:32:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 12:32:08 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 12 12:32:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 12:32:21 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 12:33:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3a0d0e87-adb3-f22f-9e98-cf9d12330e59 (at 10.9.113.15@o2ib4) in 185 seconds. I think it's dead, and I am evicting it. exp ffff8f34f438e400, cur 1562959992 expire 1562959842 last 1562959807 Jul 12 12:33:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 12:33:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8072542a-c77e-8c5c-c60e-0629def56e65 (at 10.9.113.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25208a8000, cur 1562960034 expire 1562959884 last 1562959807 Jul 12 12:42:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 12 12:42:31 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 12:42:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 12:42:31 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 12 12:42:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 12:42:56 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 12:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 12:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 12:53:02 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 12:53:02 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 12:53:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 12:53:11 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 12:58:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 12:58:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 13:03:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 13:03:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 13:03:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 13:03:02 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 13:03:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 13:03:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 13:04:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 13:04:05 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 13:05:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 13:05:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 13:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 13:13:05 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 12 13:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 13:13:05 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 12 13:15:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4dc6ad45-c67c-15d0-5638-611b0defe5f9 (at 10.8.16.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45018ea400, cur 1562962520 expire 1562962370 last 1562962293 Jul 12 13:15:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 12 13:15:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 13:15:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 13:18:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6b363170-4e30-8684-0ee2-d3bef7a36f68 (at 10.9.103.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518d83000, cur 1562962680 expire 1562962530 last 1562962453 Jul 12 13:18:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 13:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client abffd720-a2aa-412e-9038-98cd76f7763d (at 10.9.103.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdc31c00, cur 1562962696 expire 1562962546 last 1562962469 Jul 12 13:18:16 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 12 13:19:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 13458280-a046-3a7f-2bec-0301aba013a1 (at 10.8.28.12@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff8f1473afa000, cur 1562962756 expire 1562962606 last 1562962545 Jul 12 13:19:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 13:19:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d0d1dcda-abd5-29f1-1250-5971b6db7d8a (at 10.8.28.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ea41b800, cur 1562962772 expire 1562962622 last 1562962545 Jul 12 13:19:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 13:21:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2104b6c000, cur 1562962869 expire 1562962719 last 1562962642 Jul 12 13:21:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 12 13:23:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 13:23:52 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 12 13:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 13:23:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 13:26:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 13:26:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 13:34:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 13:34:20 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 13:34:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 13:34:20 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 13:36:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 13:36:13 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 13:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 13:44:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 13:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 13:44:53 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 12 13:45:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 13:45:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 13:46:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 13:46:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 13:46:26 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 13:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 13:55:03 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 13:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 13:55:03 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 12 13:56:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 13:56:34 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 12 13:58:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 94b66082-9a8c-20cb-2dfb-0baa5381ec3e (at 10.9.104.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3504108400, cur 1562965126 expire 1562964976 last 1562964899 Jul 12 14:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 14:05:05 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 12 14:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 14:05:05 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 12 14:06:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 14:06:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 14:13:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 14:15:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 14:15:29 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 14:15:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 14:15:40 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 14:17:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 14:17:00 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 12 14:25:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 14:25:43 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 14:25:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 14:25:43 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 12 14:27:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 14:27:17 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 12 14:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 50f228d0-9830-8cb0-9089-89882ee52793 (at 10.9.113.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2528ac3000, cur 1562967139 expire 1562966989 last 1562966912 Jul 12 14:32:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 14:35:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 14:35:47 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 12 14:36:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 14:36:11 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 14:38:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 14:38:08 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 12 14:42:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 14:43:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 14:45:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 14:45:47 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 12 14:47:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 14:47:02 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 14:48:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 14:48:20 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 12 14:55:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 14:55:51 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 12 14:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 14:57:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 15:00:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 15:00:42 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 15:05:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:05:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 15:05:59 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 12 15:06:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:07:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 15:07:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 15:07:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 715f02cb-8e2e-f659-95b8-6785da84ae98 (at 10.8.30.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f269ebcf400, cur 1562969427 expire 1562969277 last 1562969200 Jul 12 15:10:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:12:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 15:12:30 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 12 15:13:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c271ddbf-2f8d-722d-f50f-1f7affd6178d (at 10.9.115.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0f057acc00, cur 1562969622 expire 1562969472 last 1562969395 Jul 12 15:13:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 12 15:16:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8b6431ee-4a59-d217-cf47-d826ce17927f (at 10.9.112.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f363afd5000, cur 1562969776 expire 1562969626 last 1562969549 Jul 12 15:16:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:16:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 15:16:39 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 15:17:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 15:17:12 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 15:19:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:22:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9c2eb81b-3f24-241f-5bf3-071355b5c7e1 (at 10.9.112.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d3673f000, cur 1562970211 expire 1562970061 last 1562969984 Jul 12 15:23:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:24:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:25:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 15:25:38 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 12 15:26:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 15:26:58 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 15:27:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 15:27:49 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 15:28:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:29:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:30:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:31:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:33:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 274acbe5-1f09-1bc7-1d04-06ba56c47198 (at 10.8.25.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d26bc5400, cur 1562970796 expire 1562970646 last 1562970569 Jul 12 15:33:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:35:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:37:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 15:37:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 15:37:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 12 15:37:00 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 12 15:38:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 15:38:03 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 15:38:40 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 12 15:40:42 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 15:40:57 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 15:40:57 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages Jul 12 15:41:05 fir-md1-s1 kernel: Lustre: 23687:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 15:41:08 fir-md1-s1 kernel: Lustre: 10589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 15:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96d3d94a-0025-4481-959e-9b59edd190d8 (at 10.9.113.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c18054800, cur 1562971377 expire 1562971227 last 1562971150 Jul 12 15:42:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c623c6e5-2a28-10b9-ccff-a82c94121897 (at 10.8.15.6@o2ib6) in 201 seconds. I think it's dead, and I am evicting it. exp ffff8f29f7853000, cur 1562971453 expire 1562971303 last 1562971252 Jul 12 15:44:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:46:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 15:46:12 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jul 12 15:47:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 15:47:03 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 12 15:47:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 15:47:12 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 12 15:48:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 15:48:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 15:50:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 092321e4-f4f0-9526-3615-cc8623ccd65a (at 10.9.115.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19774e1800, cur 1562971820 expire 1562971670 last 1562971593 Jul 12 15:50:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:50:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:51:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:52:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:53:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 83d92eeb-0189-3899-1ebe-4ec09cf09eb2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25df30c800, cur 1562972000 expire 1562971850 last 1562971773 Jul 12 15:53:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:54:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 15:57:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 15:57:16 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 12 15:57:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 15:57:18 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 12 15:57:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1461e82f-da19-d2c0-6023-1022ba7a9852 (at 10.9.115.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f191a006400, cur 1562972264 expire 1562972114 last 1562972037 Jul 12 15:57:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 15:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 15:58:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 16:00:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:03:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:07:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 16:07:20 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 12 16:08:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 16:08:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 12 16:08:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 16:08:18 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 16:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5f76e786-77c9-ffeb-e686-315d04a0455d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e0f6eb400, cur 1562973056 expire 1562972906 last 1562972829 Jul 12 16:10:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 16:14:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:14:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 16:17:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 16:17:20 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 12 16:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 16:18:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 16:19:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 16:19:24 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 16:24:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:24:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 16:26:04 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562973957/real 1562973957] req@ffff8f1a499cfb00 x1636731502998496/t0(0) o104->fir-MDT0002@10.8.27.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562973964 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 12 16:26:04 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 44 previous similar messages Jul 12 16:26:12 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bfea45400 x1638241009838816/t0(0) o101->b74b4b66-65f0-f951-331c-463b7f96e033@10.9.0.62@o2ib4:17/0 lens 1768/3288 e 1 to 0 dl 1562973977 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 16:26:12 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 12 16:26:17 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22f5587500 x1638242439524656/t0(0) o101->83b4afa2-a367-a71c-8602-481ad43297ce@10.8.0.68@o2ib6:22/0 lens 592/3264 e 1 to 0 dl 1562973982 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 16:26:17 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 12 16:26:46 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1562973999/real 1562973999] req@ffff8f1a499cfb00 x1636731502998496/t0(0) o104->fir-MDT0002@10.8.27.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1562974006 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 12 16:26:46 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 12 16:26:52 fir-md1-s1 kernel: Lustre: 10585:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3366898c00 x1631578363371568/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:27/0 lens 480/568 e 0 to 0 dl 1562974017 ref 2 fl Interpret:/0/0 rc 0/0 Jul 12 16:26:52 fir-md1-s1 kernel: Lustre: 10585:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 12 16:27:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Jul 12 16:27:21 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 12 16:27:27 fir-md1-s1 kernel: LustreError: 23749:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562973957, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3312bbcc80/0x5d9ee63c05ae18f3 lrc: 3/1,0 mode: --/PR res: [0x2c002c34d:0x696:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23749 timeout: 0 lvb_type: 0 Jul 12 16:27:27 fir-md1-s1 kernel: LustreError: 23749:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 59 previous similar messages Jul 12 16:27:32 fir-md1-s1 kernel: LustreError: 21428:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1562973962, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f182b4a3180/0x5d9ee63c05b1a70f lrc: 3/1,0 mode: --/PR res: [0x2c002c34d:0x696:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21428 timeout: 0 lvb_type: 0 Jul 12 16:27:32 fir-md1-s1 kernel: LustreError: 21428:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 12 16:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8c306b25-6991-df5d-1f1e-98e88c217f74 (at 10.8.27.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3995310400, cur 1562974056 expire 1562973906 last 1562973829 Jul 12 16:27:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 16:27:37 fir-md1-s1 kernel: LustreError: 23704:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ecb194b00 x1636731505256496/t0(0) o104->fir-MDT0000@10.8.27.7@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 12 16:28:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 16:28:43 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 16:30:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 16:30:29 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 16:36:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:36:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 16:37:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 16:37:49 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 12 16:39:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 16:39:06 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 16:41:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 16:41:50 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 16:47:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 16:47:51 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 12 16:49:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 16:49:15 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 16:50:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 16:50:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 16:52:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 16:52:02 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 16:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 16:57:53 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 16:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 16:59:20 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 17:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8cb00db-6694-c576-4092-4a678c6e80f9 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44ec62a400, cur 1562976004 expire 1562975854 last 1562975777 Jul 12 17:00:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 17:01:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 17:01:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 17:02:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 17:02:30 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 17:08:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 17:08:11 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 12 17:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 17:09:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 17:12:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 17:12:46 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 17:15:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 17:15:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 17:18:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 17:18:29 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 12 17:20:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 17:20:17 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 17:23:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 17:23:31 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 17:28:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 17:28:47 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 17:30:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 17:30:30 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 12 17:33:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 17:33:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 17:33:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 17:33:43 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 17:38:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 17:38:53 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 17:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 17:41:11 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 17:44:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 17:44:55 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 17:49:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 17:49:05 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 17:50:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 17:50:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 12 17:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 17:51:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 17:55:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 17:55:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 12 17:57:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f147812a000, cur 1562979472 expire 1562979322 last 1562979245 Jul 12 17:57:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 17:59:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 17:59:10 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 18:01:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 18:01:51 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 18:05:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 18:05:23 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 18:07:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 18:07:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 18:09:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 12 18:09:18 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 12 18:12:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 18:12:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 18:15:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 18:15:42 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 18:18:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 18:19:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 18:19:19 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 12 18:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 18:23:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 18:26:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 12 18:26:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 18:29:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 18:29:24 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 12 18:29:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 18:29:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 18:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 18:35:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 12 18:36:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 18:36:47 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 18:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 18:39:25 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 12 18:41:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dab487b8-ac88-5102-eda2-bdced899b20d (at 10.8.8.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bcecf6000, cur 1562982092 expire 1562981942 last 1562981865 Jul 12 18:41:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 12 18:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 18:45:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 18:46:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 18:46:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 18:49:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 18:49:28 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 12 18:50:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 18:50:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 18:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 18:55:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 18:56:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 18:56:50 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 12 18:59:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 18:59:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 18:59:29 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 12 19:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 19:05:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 19:07:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 19:07:23 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 19:09:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 19:09:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 12 19:10:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 19:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 19:15:45 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 12 19:17:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 19:17:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 19:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 19:19:42 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 12 19:22:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 19:22:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 19:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 19:26:36 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 19:28:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 19:28:01 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 19:28:36 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 12 19:29:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 19:29:51 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 12 19:38:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 19:38:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 19:38:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 19:38:50 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 19:39:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 19:39:51 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 19:48:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 19:48:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 19:50:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 19:50:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 12 19:50:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 19:50:07 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 12 19:58:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 19:58:46 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 20:00:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 20:00:08 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 12 20:00:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 20:00:08 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 12 20:01:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:01:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 12 20:09:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 20:09:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 20:10:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 20:10:19 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 12 20:10:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 20:10:19 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 12 20:18:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:18:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 20:19:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 20:19:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 20:20:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 20:20:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 20:20:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 20:20:20 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 12 20:20:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:27:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:29:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 20:29:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 12 20:30:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 20:30:25 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 12 20:30:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:31:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 20:31:27 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 12 20:39:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 20:39:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 12 20:40:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 20:40:35 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 12 20:42:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:42:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 20:42:05 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 12 20:48:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 20:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 20:50:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 20:50:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 20:50:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 20:53:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 20:53:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 21:00:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 21:00:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 21:00:43 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 21:00:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 21:00:43 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 12 21:03:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 21:03:13 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 12 21:10:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 21:10:47 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 12 21:10:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 21:10:50 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 12 21:13:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 21:13:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 21:16:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 21:21:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 21:21:26 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 21:21:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 21:21:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 21:23:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 21:23:31 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 21:29:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 21:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 21:31:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 21:31:37 fir-md1-s1 kernel: Lustre: Skipped 126 previous similar messages Jul 12 21:31:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 12 21:33:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 21:33:55 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 12 21:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5c4c5b6a-001d-e26d-f4d4-23e598bc49a5 (at 10.9.103.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518949800, cur 1562992498 expire 1562992348 last 1562992271 Jul 12 21:34:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 21:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 21:40:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 12 21:41:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f71ffc000, cur 1562992890 expire 1562992740 last 1562992663 Jul 12 21:41:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 12 21:41:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 21:41:40 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 21:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 21:41:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 12 21:44:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 21:44:03 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 12 21:51:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 21:51:46 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 12 21:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 21:52:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 12 21:54:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 21:54:25 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 21:54:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 21:54:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 22:02:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 22:02:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 12 22:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf2fec24-a441-2b9a-3334-0bc96ce2df5f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2255dbd400, cur 1562994130 expire 1562993980 last 1562993903 Jul 12 22:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 22:02:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 22:04:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 22:04:32 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 12 22:12:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 22:12:15 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 12 22:13:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 22:13:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 12 22:15:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 22:15:29 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 22:19:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 22:19:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 22:22:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 22:22:18 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 12 22:23:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 22:23:25 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 22:26:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 22:26:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 12 22:32:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 22:32:19 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 12 22:33:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 12 22:33:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 22:36:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 22:36:29 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 12 22:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 12 22:42:30 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 12 22:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 22:43:52 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 12 22:47:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 12 22:47:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 12 22:52:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 22:52:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 12 22:52:30 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 12 22:53:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 22:53:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 22:53:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 12 22:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 22:59:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 22:59:55 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 12 23:02:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 23:02:36 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 12 23:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 12 23:03:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:03:59 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 12 23:08:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:10:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 23:10:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 12 23:13:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 23:13:11 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 12 23:14:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:15:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 12 23:15:02 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 23:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 74e11ba4-980c-f875-a68c-e22360c64935 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fa668f800, cur 1562998668 expire 1562998518 last 1562998441 Jul 12 23:17:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 23:21:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 23:21:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 12 23:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 23:23:19 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 12 23:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 23:25:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 12 23:27:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:27:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 12 23:31:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 12 23:31:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 12 23:32:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef2b19e1-b66e-f78f-ca40-ca13fb6d4d06 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24b6f9e000, cur 1562999520 expire 1562999370 last 1562999293 Jul 12 23:32:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 12 23:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 23:33:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 12 23:35:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 23:35:15 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 12 23:38:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:38:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 23:42:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 12 23:42:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 12 23:43:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 12 23:43:46 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 12 23:45:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 12 23:45:19 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 12 23:51:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 12 23:51:41 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 12 23:52:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 12 23:52:56 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 12 23:53:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 12 23:53:54 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 12 23:55:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 12 23:55:20 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 00:02:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 00:02:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 00:02:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 00:02:58 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 00:04:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 00:04:07 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 13 00:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 00:05:29 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 00:13:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 00:13:17 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 13 00:14:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 00:14:12 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 00:15:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 00:15:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 13 00:20:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 00:20:03 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 00:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 00:24:25 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 13 00:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 00:25:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 00:27:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 00:27:42 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 13 00:31:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 00:31:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 00:34:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 00:34:31 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 00:35:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 00:35:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 13 00:37:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 00:37:43 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 00:44:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 00:44:33 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 13 00:45:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 00:45:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 00:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 00:45:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 00:47:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 00:47:45 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 13 00:54:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 00:54:35 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 13 00:56:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 00:56:10 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 00:56:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 00:56:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 00:57:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 00:57:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 13 01:04:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 01:04:35 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 13 01:06:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 01:06:15 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 01:08:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 01:08:18 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 01:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b39a470f-e258-0a6b-08d8-9a798f8b9f1c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd37d000, cur 1563005318 expire 1563005168 last 1563005091 Jul 13 01:08:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 01:08:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 01:08:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 01:14:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 01:14:37 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 01:16:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 01:16:41 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 13 01:19:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 01:19:31 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 01:24:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 01:24:42 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 13 01:26:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5d399386-b1fb-d405-e88f-f20c8d175a51 (at 10.8.25.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252367b000, cur 1563006385 expire 1563006235 last 1563006158 Jul 13 01:26:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 01:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 01:26:46 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 01:28:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 01:28:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 01:30:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 01:30:11 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 01:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 01:34:53 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 13 01:36:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 01:36:53 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 01:40:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef4e1fdc-8937-844b-21bf-e4b85d8fcd3a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28de8a9c00, cur 1563007208 expire 1563007058 last 1563006981 Jul 13 01:40:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 01:41:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 01:41:48 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 01:44:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 01:44:59 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 13 01:45:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 01:45:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 13 01:47:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 01:47:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 13 01:51:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 01:51:57 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 01:55:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 01:55:03 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 13 01:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 01:57:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 01:59:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 01:59:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 13 02:02:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 02:02:54 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 02:05:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 02:05:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 02:05:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0fdbf61400, cur 1563008715 expire 1563008565 last 1563008488 Jul 13 02:05:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 02:07:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 02:07:22 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 02:12:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 992f0c36-535d-31fb-df55-36ff304cdd4d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef4fee800, cur 1563009133 expire 1563008983 last 1563008906 Jul 13 02:12:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 02:12:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 02:13:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 02:13:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 13 02:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 02:15:14 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 13 02:17:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 02:17:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 02:24:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 02:24:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 02:25:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 02:25:17 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 02:26:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 02:27:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 02:27:34 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 02:34:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 02:34:42 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 02:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 02:35:26 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 02:37:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 02:37:42 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 02:40:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 02:40:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 02:45:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 02:45:30 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 02:46:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 02:46:25 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 13 02:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 02:47:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 02:50:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f9af0770-c7bd-566c-affe-31bdf8c8eed6 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4535c7ec00, cur 1563011445 expire 1563011295 last 1563011218 Jul 13 02:50:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 02:56:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 02:56:03 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 13 02:57:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 02:57:22 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 02:58:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 02:58:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 02:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b5210801-eaf6-299e-958d-1d0d0937fe0b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31f0194800, cur 1563011964 expire 1563011814 last 1563011737 Jul 13 02:59:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 03:04:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 03:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 03:06:14 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 03:08:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 03:08:29 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 13 03:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 03:08:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 03:16:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 03:16:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 13 03:16:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 03:16:15 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 03:19:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 03:19:06 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 03:19:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 03:19:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 03:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 03:26:15 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 13 03:26:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 03:29:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 03:29:43 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 13 03:29:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 03:29:47 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 03:35:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 03:35:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 03:36:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 03:36:24 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 03:40:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 03:40:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 03:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 03:40:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 03:46:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 03:46:45 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 13 03:50:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 03:50:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 13 03:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 03:50:52 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 03:56:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 03:56:55 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 13 04:00:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:00:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 04:01:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 04:01:00 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 04:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 04:01:33 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 04:03:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:04:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ecb5ad2c-7f68-a999-a141-4ba5fa8d702a (at 10.8.13.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2508cc5000, cur 1563015893 expire 1563015743 last 1563015666 Jul 13 04:04:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 04:06:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3b8a762400, cur 1563015993 expire 1563015843 last 1563015766 Jul 13 04:06:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 13 04:07:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 04:07:27 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 04:11:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 04:11:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 04:12:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 04:12:02 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 13 04:14:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:17:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 04:17:33 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 13 04:19:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:19:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 13 04:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 04:21:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 04:22:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 04:22:35 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 13 04:27:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 76e12dce-c40a-5a48-4e41-308e77527a3a (at 10.8.30.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250d9c7400, cur 1563017223 expire 1563017073 last 1563016996 Jul 13 04:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76e12dce-c40a-5a48-4e41-308e77527a3a (at 10.8.30.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250da2b800, cur 1563017243 expire 1563017093 last 1563017016 Jul 13 04:27:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 13 04:27:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 04:27:49 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 13 04:30:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:30:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 04:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 04:32:07 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 04:32:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 04:32:48 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 04:38:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 04:38:02 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 13 04:40:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:40:38 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 13 04:42:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 04:42:10 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 04:42:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 04:42:49 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 13 04:47:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39a3bc2c00, cur 1563018461 expire 1563018311 last 1563018234 Jul 13 04:48:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 04:48:08 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 13 04:51:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 04:51:12 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 13 04:53:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 04:53:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 04:53:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 04:53:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 04:58:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 04:58:39 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 13 05:02:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:02:03 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 05:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 05:03:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 05:03:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 05:03:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 13 05:08:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 05:08:47 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 05:12:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:12:27 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 13 05:13:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 05:13:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 05:15:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 05:15:30 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 13 05:19:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 05:19:12 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 05:23:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:23:02 fir-md1-s1 kernel: LustreError: Skipped 15 previous similar messages Jul 13 05:23:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b07aab000, cur 1563020593 expire 1563020443 last 1563020366 Jul 13 05:24:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 05:24:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 05:25:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 05:25:45 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 13 05:29:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 05:29:16 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 13 05:33:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:33:04 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 05:34:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 05:34:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 05:36:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 05:36:50 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 05:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 05:39:25 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 13 05:43:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:43:14 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 13 05:44:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 05:44:39 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 05:46:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 05:46:55 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 13 05:49:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 05:49:34 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 13 05:53:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 05:53:20 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 05:55:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 05:55:10 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 05:57:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 05:57:00 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 13 05:59:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 05:59:51 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 06:03:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:03:41 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 13 06:05:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 06:05:30 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 06:07:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 06:07:01 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 06:10:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 06:10:02 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 13 06:14:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:14:43 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 06:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 06:15:31 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 06:17:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 06:17:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 13 06:20:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 06:20:04 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 13 06:24:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:24:52 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 13 06:25:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 06:25:32 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 06:27:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 06:27:37 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 06:30:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 06:30:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 06:32:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5efe2f72-0a5c-fff4-b523-16ec38cac7e2 (at 10.9.103.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252147f400, cur 1563024778 expire 1563024628 last 1563024551 Jul 13 06:35:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:35:25 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 13 06:35:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 06:35:45 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 06:38:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 06:38:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 06:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 06:40:27 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 13 06:45:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:45:59 fir-md1-s1 kernel: LustreError: Skipped 14 previous similar messages Jul 13 06:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 06:46:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 06:49:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 06:49:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 13 06:50:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 06:50:48 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 06:56:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 06:56:07 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 13 06:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 06:56:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 07:00:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 07:00:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 07:01:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 07:01:12 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 07:06:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 07:06:20 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 07:07:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 07:07:00 fir-md1-s1 kernel: LustreError: Skipped 16 previous similar messages Jul 13 07:10:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 07:10:47 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 07:11:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 07:11:14 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 13 07:16:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 07:16:27 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 07:17:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 07:17:43 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 13 07:21:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 07:21:31 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 13 07:21:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 07:21:31 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 13 07:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 07:26:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 07:27:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 07:27:48 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 07:29:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3266f5d400, cur 1563028179 expire 1563028029 last 1563027952 Jul 13 07:29:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 07:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 07:31:38 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 13 07:31:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 07:31:40 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 07:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 07:37:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 07:39:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 07:39:15 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 13 07:41:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 07:41:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 07:41:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 07:41:44 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 07:47:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 07:47:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 07:50:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 07:50:07 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 07:51:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 07:51:49 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 13 07:51:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 07:51:52 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 13 07:57:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 07:57:58 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 08:02:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 08:02:04 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 08:02:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:02:19 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 08:03:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 282d29b1-b17a-d2c6-8c52-58515f7a3b2a (at 10.9.101.38@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d7fa6b000, cur 1563030208 expire 1563030058 last 1563029981 Jul 13 08:04:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 08:04:08 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 13 08:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 08:08:12 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 08:12:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 08:12:17 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 13 08:13:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:13:02 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 08:14:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 08:14:14 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 13 08:18:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 08:18:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 08:22:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 08:22:22 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 13 08:24:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:24:09 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 08:24:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 08:24:15 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 08:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 08:28:25 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 08:32:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 08:32:25 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 13 08:34:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 08:34:17 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 13 08:34:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:34:34 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 08:38:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 08:38:48 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 08:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 08:43:02 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Jul 13 08:44:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 08:44:24 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 08:47:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:47:27 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 08:48:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 08:48:48 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 08:53:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 08:53:10 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 08:55:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 08:55:02 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 13 08:58:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 08:58:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 08:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 08:58:51 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 09:03:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 09:03:14 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 09:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b1ced5c00, cur 1563033890 expire 1563033740 last 1563033663 Jul 13 09:04:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 13 09:05:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 09:05:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 09:08:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 09:08:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:08:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 09:08:52 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 09:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 09:13:49 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 09:15:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 09:15:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 09:19:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 09:19:15 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 09:20:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:20:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:20:15 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 09:23:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 09:23:50 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 13 09:26:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 09:26:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 09:29:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 09:29:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 09:31:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:31:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 09:34:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 09:34:28 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 09:37:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 09:37:20 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 09:39:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 09:39:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 09:42:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:42:24 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 09:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 09:44:52 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 13 09:48:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 09:48:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 09:50:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 09:50:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 09:52:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 09:52:50 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 09:54:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 09:54:56 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 13 09:58:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 09:58:17 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 13 10:01:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 10:01:11 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 10:03:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:03:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 10:05:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 10:05:15 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 13 10:08:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 10:08:27 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 10:11:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 10:11:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 10:14:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:14:51 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 10:15:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 10:15:21 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 10:18:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 10:18:33 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 13 10:21:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 10:21:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 10:25:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:25:20 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 10:25:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 10:25:27 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 13 10:31:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 10:31:23 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 10:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 10:31:43 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 10:35:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:35:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 10:35:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 10:35:27 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 10:42:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 10:42:08 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 13 10:42:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 10:42:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 10:45:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 10:45:30 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 13 10:47:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:47:41 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 10:52:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 10:52:13 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 10:52:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 10:52:17 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 10:55:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 10:55:31 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 13 10:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 10:59:58 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 11:02:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 11:02:53 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 11:04:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 11:04:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 11:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 11:06:33 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 13 11:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 11:11:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 11:13:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 11:13:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 11:14:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 11:14:07 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 11:16:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 11:16:35 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 13 11:22:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 11:22:12 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 11:24:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 11:24:18 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 11:24:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 11:24:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 11:26:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 11:26:41 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 11:33:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 11:33:17 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 11:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 11:34:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 11:34:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 11:34:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 13 11:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 11:37:02 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 11:43:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 11:43:20 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 11:44:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 11:44:30 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 11:44:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 11:44:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 11:47:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 11:47:03 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 13 11:54:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 11:54:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 11:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 11:54:45 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 11:55:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 11:55:30 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 13 11:57:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 11:57:16 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 13 12:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 12:04:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 12:05:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:05:13 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 12:06:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 12:06:52 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 12:07:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 12:07:20 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 12:15:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 12:15:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 12:16:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:16:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 12:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 12:17:22 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 12:17:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 12:17:39 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 13 12:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 12:25:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 12:27:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:27:21 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 12:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 12:27:23 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 13 12:27:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 12:27:49 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 13 12:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 12:35:26 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 12:37:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:37:23 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 12:37:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 12:37:26 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 13 12:37:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 12:37:55 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 13 12:46:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 12:46:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 12:47:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 12:47:29 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 13 12:47:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:47:35 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 12:48:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 12:48:00 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 12:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 12:56:27 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 12:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 12:57:49 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 13 12:58:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 12:58:07 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 12:59:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 12:59:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 13:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 13:06:50 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 13:07:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 13:07:57 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 13 13:09:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 13:09:34 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 13 13:13:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 13:13:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 13:17:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 13:17:05 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 13:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 13:18:21 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 13 13:20:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 13:20:36 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 13:23:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 13:23:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 13:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 13:27:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 13:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 13:28:25 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 13:32:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 13:32:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 13:33:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 13:33:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 13:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 13:37:34 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 13:38:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 13:38:27 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 13 13:42:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 13:42:14 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 13:43:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 13:43:53 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 13:47:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 13:47:56 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 13:48:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 13:48:36 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 13 13:52:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 13:52:29 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 13:55:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 13:55:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 13:58:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 13:58:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 13:59:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 13:59:01 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 14:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d90ec5000, cur 1563051623 expire 1563051473 last 1563051396 Jul 13 14:03:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:03:12 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 14:06:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:06:11 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 14:08:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 14:08:19 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 14:09:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 14:09:12 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 13 14:13:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:13:16 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 14:16:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:16:32 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 14:18:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 14:18:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 14:19:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 14:19:19 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 13 14:24:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:24:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 14:27:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:27:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 14:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 14:28:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 13 14:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 14:29:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 14:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e9ca01000, cur 1563053435 expire 1563053285 last 1563053208 Jul 13 14:35:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:35:42 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 14:37:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:37:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 14:39:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 14:39:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 14:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 14:39:29 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 13 14:45:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:45:43 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 14:47:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:47:50 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 14:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 14:49:23 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 14:49:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 14:49:55 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 13 14:55:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 14:55:59 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 14:58:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 14:58:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 13 14:59:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 14:59:43 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 15:00:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 15:00:06 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 13 15:06:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 15:06:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 15:08:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 15:08:42 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 15:09:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 15:09:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 15:10:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 15:10:06 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 13 15:19:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 15:19:36 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 13 15:20:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 15:20:07 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 13 15:20:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 15:20:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 15:21:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 15:21:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 15:29:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 15:29:46 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 13 15:30:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 15:30:17 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 13 15:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 15:30:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 15:32:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 15:32:38 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 15:39:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 15:39:51 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 13 15:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 15:40:17 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 13 15:40:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 15:40:46 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 15:42:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 15:42:45 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 15:49:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 15:49:56 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 15:50:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 15:50:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 13 15:50:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 15:50:55 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 15:53:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 15:53:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 16:00:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:00:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 13 16:00:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 16:00:28 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 16:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 16:01:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 16:03:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 16:03:35 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 16:10:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:10:04 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 13 16:11:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 16:11:15 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 16:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 16:11:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 16:17:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 16:17:00 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 16:20:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:20:04 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 13 16:21:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 16:21:16 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 16:21:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 16:21:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 16:31:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 16:31:18 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 16:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 16:31:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 16:32:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:32:11 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 16:37:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 16:37:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 13 16:41:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 16:41:19 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 13 16:41:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 16:41:49 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 16:42:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:42:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 16:51:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 16:51:16 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 16:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 16:51:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 16:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 16:51:51 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 13 16:52:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 16:52:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 17:01:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:01:25 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 17:01:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 17:01:55 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 13 17:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 17:02:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 13 17:03:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 17:03:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 13 17:11:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:11:55 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 13 17:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 17:11:56 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 13 17:12:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 17:12:22 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 17:13:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 17:13:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 17:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 17:22:15 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 13 17:22:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:22:28 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 13 17:22:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 17:22:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 17:22:37 fir-md1-s1 kernel: Lustre: 23633:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563063750/real 1563063750] req@ffff8f345af3b900 x1636732585657632/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563063757 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 13 17:22:37 fir-md1-s1 kernel: Lustre: 23633:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jul 13 17:25:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 17:25:25 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 13 17:28:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522e43400, cur 1563064091 expire 1563063941 last 1563063864 Jul 13 17:32:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 17:32:18 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 17:32:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:32:47 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 13 17:33:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 17:33:39 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 17:35:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 17:35:39 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 17:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 17:42:20 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 13 17:43:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 17:43:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 17:44:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:44:45 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 17:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 60f59e7b-5296-e995-71c3-01213d30e8c4 (at 10.8.24.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f1ed3800, cur 1563065090 expire 1563064940 last 1563064863 Jul 13 17:46:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 17:46:25 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 17:52:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 17:52:36 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 17:53:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 17:53:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 17:55:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 17:55:55 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 13 17:58:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 17:58:42 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 18:03:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 18:03:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 13 18:03:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 18:03:57 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 18:05:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 18:05:58 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 13 18:09:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 18:09:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 18:13:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 18:13:06 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 13 18:13:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 18:13:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 18:21:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 18:21:21 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 13 18:23:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 18:23:12 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 13 18:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 18:24:06 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 13 18:31:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 18:31:22 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 13 18:33:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 18:33:20 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 13 18:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 18:34:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 18:35:20 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1f2c4a0c00 x1634178254817744/t352284552582(0) o36->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:25/0 lens 488/3152 e 1 to 0 dl 1563068125 ref 2 fl Interpret:/0/0 rc 0/0 Jul 13 18:40:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 18:40:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 13 18:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 18:41:26 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 13 18:43:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 18:43:20 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 13 18:44:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 18:44:45 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 18:52:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 18:52:11 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 13 18:53:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 18:53:33 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 13 18:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 18:54:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 18:55:01 fir-md1-s1 kernel: Lustre: 21411:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563069294/real 1563069294] req@ffff8f0ec188c500 x1636732624695296/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563069301 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 13 18:58:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:02:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 19:02:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 19:03:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 19:03:54 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 13 19:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 19:04:51 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 13 19:09:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:10:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:13:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 19:13:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 19:13:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 19:13:56 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 19:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 19:15:00 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 19:18:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:23:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 19:23:31 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 19:23:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:24:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 19:24:23 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 13 19:25:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 19:25:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 19:31:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:31:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 13 19:33:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 19:33:35 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 19:34:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 19:34:42 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 13 19:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 19:35:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 13 19:39:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 19:43:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 19:43:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 13 19:44:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 19:44:42 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 13 19:45:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 19:45:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 19:54:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 19:54:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 19:54:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 19:54:49 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 13 19:55:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 19:55:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 20:00:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 20:04:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 20:04:31 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 20:04:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 20:04:55 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 13 20:05:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 20:05:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 13 20:13:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 20:13:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 20:15:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 20:15:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 13 20:15:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 20:15:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 13 20:16:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 20:16:10 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 20:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 20:25:23 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 20:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 20:26:16 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 20:28:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 20:28:16 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 13 20:32:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 20:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 20:35:28 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 20:36:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 20:36:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 20:39:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 20:39:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 20:45:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 20:45:42 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 13 20:47:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 20:47:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 20:50:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 13 20:50:55 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 13 20:55:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 20:55:44 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 13 20:57:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 20:57:35 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 21:01:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:01:04 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 21:05:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 21:05:45 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 13 21:07:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 21:07:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 21:11:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:11:08 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 21:16:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 21:16:04 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 13 21:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 21:17:47 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 21:21:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:21:25 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 13 21:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 21:26:15 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 21:26:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 21:27:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 21:27:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 21:31:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:31:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 21:36:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 21:36:16 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 13 21:38:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 21:38:02 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 21:42:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:42:31 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 13 21:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 21:46:29 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 21:46:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 21:47:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 21:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 21:48:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 21:52:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 21:52:34 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 21:56:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 21:56:31 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 13 21:58:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 21:58:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 13 22:02:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:02:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 22:02:45 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 13 22:07:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 22:07:11 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 13 22:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 22:08:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 13 22:11:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:13:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 22:13:48 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 13 22:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 13 22:17:22 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 13 22:18:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 22:18:55 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 13 22:24:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 22:24:19 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 13 22:27:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 22:27:26 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 13 22:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 22:28:57 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 13 22:29:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:35:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 13 22:35:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 22:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 22:37:34 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 13 22:39:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 13 22:39:02 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 13 22:46:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 22:46:17 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 13 22:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 13 22:47:35 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 13 22:48:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:49:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 13 22:49:21 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 22:56:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 22:56:30 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 13 22:57:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 22:57:42 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 13 22:58:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:59:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 22:59:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 22:59:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 23:03:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 23:06:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 23:06:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 13 23:07:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 13 23:07:56 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 23:10:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 13 23:10:08 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 13 23:17:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 13 23:17:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 23:18:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 23:18:20 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 13 23:20:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 23:20:11 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 13 23:27:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 23:27:11 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 13 23:28:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 23:28:21 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 13 23:30:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 13 23:30:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 13 23:37:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 23:37:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 13 23:38:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 13 23:38:48 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 13 23:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 23:40:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 13 23:47:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 13 23:47:50 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 13 23:49:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 23:49:01 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 13 23:49:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 23:50:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 13 23:50:36 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 13 23:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dc206ad9-6c70-6097-3407-cb9490b12136 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc2bf000, cur 1563087236 expire 1563087086 last 1563087009 Jul 13 23:53:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 13 23:55:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 13 23:57:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 13 23:57:54 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 13 23:59:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 13 23:59:08 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 14 00:00:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 00:00:38 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 00:07:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:07:54 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 00:09:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 00:09:51 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 14 00:10:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 00:10:50 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 00:18:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:18:14 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 00:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 00:19:56 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 14 00:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 00:21:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 00:25:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:28:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:28:22 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 14 00:29:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:29:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 00:29:59 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 00:31:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:31:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 00:31:19 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 00:36:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:38:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:38:26 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 14 00:40:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 00:40:24 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 14 00:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 00:42:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 00:48:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:48:47 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 00:50:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 00:50:31 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 14 00:52:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 00:52:29 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 00:55:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:56:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 00:59:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 00:59:17 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 14 01:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 01:00:45 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 14 01:02:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 01:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 01:02:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 01:03:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 01:10:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 01:10:58 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 14 01:12:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 01:12:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 01:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 01:12:59 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 01:20:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 01:20:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 14 01:23:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 01:23:04 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 14 01:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 01:27:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 01:30:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 01:31:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 01:31:26 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 14 01:33:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 01:33:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 01:37:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 01:37:34 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 14 01:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 01:41:30 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 01:43:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 01:43:35 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 01:50:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 01:50:53 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 14 01:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 01:51:37 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 01:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 01:53:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 01:57:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 01:58:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 01:59:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 02:01:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 02:01:47 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 02:01:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 02:01:47 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 14 02:03:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 02:03:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 02:11:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 02:11:49 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 02:11:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 02:11:49 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 14 02:12:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 02:13:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 02:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 02:13:53 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 02:21:01 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 14 02:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 02:22:14 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 14 02:23:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 02:24:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 02:24:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 02:24:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 02:24:14 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 02:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 02:32:17 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 02:34:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 02:34:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 02:34:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 02:34:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 02:36:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e96f0f000, cur 1563096963 expire 1563096813 last 1563096736 Jul 14 02:36:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 02:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 02:42:32 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 02:44:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 02:44:26 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 14 02:44:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 02:44:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 14 02:52:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 02:52:41 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 14 02:54:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 02:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 02:54:32 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 02:54:32 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 03:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 03:02:57 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 14 03:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 03:05:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 14 03:08:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 03:08:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 03:12:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 03:12:58 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 03:14:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 03:15:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 03:16:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:18:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 03:18:41 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 03:21:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:23:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 03:23:23 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 03:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 03:25:23 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 03:26:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:26:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f787d8800, cur 1563100157 expire 1563100007 last 1563099930 Jul 14 03:29:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 03:29:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 03:33:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 03:33:24 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 14 03:36:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 03:36:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 03:37:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:39:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:40:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 03:42:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 03:42:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 14 03:43:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 03:43:48 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 14 03:47:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 03:47:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 14 03:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 03:53:56 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 14 03:54:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 03:54:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 14 03:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 03:57:37 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 03:59:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 04:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 04:04:03 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 04:05:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:05:47 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 14 04:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 04:07:38 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 04:08:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 04:09:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 04:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 04:14:08 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 14 04:14:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 04:15:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:15:53 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 14 04:17:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 04:17:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 14 04:24:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 04:24:09 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 04:26:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:26:53 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 14 04:27:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 04:27:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 04:34:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 04:34:11 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 14 04:37:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:37:03 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 14 04:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 04:38:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 04:44:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 04:44:19 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 04:47:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:47:40 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 04:48:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 04:48:39 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 04:51:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 04:54:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 04:54:25 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 14 04:56:30 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563105383/real 1563105383] req@ffff8f14b0f47b00 x1636732844994640/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563105390 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 04:57:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 04:57:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 14 04:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 04:58:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 05:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 05:04:31 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 14 05:07:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 05:07:57 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 14 05:09:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 05:09:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 05:14:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 05:15:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 05:15:04 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 14 05:18:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 05:18:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 05:19:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 05:19:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 05:25:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 05:25:09 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 14 05:28:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 05:28:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 05:29:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 05:29:21 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 05:35:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 05:35:12 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 05:38:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 14 05:38:36 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 05:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 05:39:25 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 05:41:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 05:42:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 05:45:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 05:45:15 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 14 05:48:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 05:48:53 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 14 05:49:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 05:49:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 05:53:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 05:54:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4815f99e-94fc-2359-c40b-ef5555f91d5e (at 10.9.113.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1028230400, cur 1563108853 expire 1563108703 last 1563108626 Jul 14 05:55:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 05:55:18 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 14 05:56:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 05:59:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 05:59:42 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 05:59:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 05:59:55 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 06:03:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 06:05:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 06:05:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 06:06:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 06:09:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 06:09:44 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 14 06:09:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 06:09:56 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 06:15:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 06:15:43 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 14 06:20:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 06:20:07 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 06:22:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 06:22:06 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 14 06:22:10 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563110523/real 1563110523] req@ffff8f0b86c40300 x1636732872785552/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563110530 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 06:25:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 06:25:43 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 14 06:30:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 06:30:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 06:32:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 06:32:07 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 14 06:32:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 06:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 06:35:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 14 06:40:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 06:40:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 06:42:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 06:42:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 14 06:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 06:45:53 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 06:47:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 06:50:03 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563112196/real 1563112196] req@ffff8f062633b600 x1636732881916080/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563112203 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 06:50:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 06:50:58 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 06:52:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 06:52:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 14 06:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 06:56:07 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 14 07:01:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 07:01:03 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 07:02:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 07:02:50 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 14 07:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 07:06:14 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 14 07:11:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 07:11:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 07:11:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:14:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 14 07:14:11 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 14 07:16:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 07:16:44 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 07:20:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:21:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 07:21:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 14 07:24:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 07:24:23 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 07:27:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 07:27:06 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 07:30:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 07:31:43 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 07:35:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 14 07:35:04 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 14 07:37:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 07:37:08 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 14 07:37:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:41:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 07:41:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 14 07:45:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 07:45:32 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 14 07:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 07:47:16 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 14 07:48:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:48:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 07:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 07:51:58 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 07:55:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 07:55:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 14 07:57:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 07:57:27 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 14 08:02:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 08:02:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 08:06:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 08:06:10 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 08:07:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:07:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 08:07:42 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 14 08:08:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 08:13:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 08:17:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 08:17:35 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 08:17:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 08:17:56 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 14 08:20:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:21:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:22:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 08:23:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 08:23:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:27:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 08:27:48 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 08:28:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 08:28:05 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 14 08:29:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:33:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 08:33:32 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 08:37:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f207fb8ec00, cur 1563118636 expire 1563118486 last 1563118409 Jul 14 08:37:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 08:38:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 08:38:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 08:38:07 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 08:38:07 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 08:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 08:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 08:43:33 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 08:48:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 08:48:33 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 14 08:49:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 08:49:28 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 08:52:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 08:53:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 08:53:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 08:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 08:58:35 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 14 08:59:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 08:59:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 09:03:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 09:03:50 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 09:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 09:08:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 14 09:09:31 fir-md1-s1 kernel: Lustre: 27321:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563120564/real 1563120564] req@ffff8f14426b6f00 x1636732935944528/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563120571 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 09:10:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 09:10:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 14 09:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 09:13:58 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 09:14:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:15:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:16:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:18:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 09:18:57 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 14 09:20:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 09:20:08 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 14 09:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 09:24:37 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 14 09:29:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 09:29:18 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 14 09:30:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 09:30:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 14 09:35:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 09:35:02 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 09:39:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 09:39:18 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 14 09:40:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:40:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 09:40:27 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 09:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 09:45:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 09:48:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:49:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 09:49:20 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 14 09:49:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:50:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 09:51:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 09:51:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 09:55:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 09:55:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 09:59:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 09:59:23 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 14 10:02:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 10:02:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 10:05:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 10:05:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 10:09:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 10:09:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 10:09:53 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 10:10:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 10:12:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 10:12:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 10:15:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 10:15:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 10:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 10:19:56 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 14 10:20:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 10:23:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 10:23:13 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 10:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 10:26:24 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 10:30:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 10:30:05 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 14 10:33:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 10:33:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 10:34:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 10:34:39 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 10:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 10:36:31 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 10:40:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 10:40:09 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 14 10:44:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 10:44:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 14 10:46:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 10:46:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 10:50:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 10:50:16 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 10:54:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 10:54:45 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 14 10:56:03 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563126955/real 1563126955] req@ffff8f1461a47200 x1636732976480176/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563126962 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 10:57:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 10:57:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 10:58:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 235865aa-6c17-ab70-0ed1-9e86f8359a3f (at 10.9.107.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2524653400, cur 1563127128 expire 1563126978 last 1563126901 Jul 14 11:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 39fe18a4-a89c-1a84-3eb2-1fc3124ee4a0 (at 10.9.108.28@o2ib4) in 211 seconds. I think it's dead, and I am evicting it. exp ffff8f1476702400, cur 1563127204 expire 1563127054 last 1563126993 Jul 14 11:00:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 11:00:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 11:00:18 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 14 11:00:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 901eeaec-75a4-1e60-2c55-e9a045a13705 (at 10.9.108.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24e6d08c00, cur 1563127220 expire 1563127070 last 1563126993 Jul 14 11:00:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 11:04:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 11:04:47 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 14 11:07:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 11:07:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 11:10:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 11:10:30 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 11:17:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 11:17:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 14 11:17:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 11:17:54 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 14 11:19:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:20:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:20:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 11:20:32 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 14 11:23:10 fir-md1-s1 kernel: Lustre: 21411:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563128583/real 1563128583] req@ffff8f0a34a8a100 x1636732986989184/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563128590 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 11:23:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:27:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 11:27:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 11:28:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 11:28:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 11:30:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.20@o2ib4) Jul 14 11:30:43 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 14 11:33:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:34:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:37:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 11:37:25 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 11:38:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 11:38:14 fir-md1-s1 kernel: Lustre: Skipped 49143 previous similar messages Jul 14 11:39:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:40:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:40:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 11:40:44 fir-md1-s1 kernel: Lustre: Skipped 49196 previous similar messages Jul 14 11:45:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:48:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 11:48:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 11:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 11:48:22 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 11:48:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:50:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 11:50:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 14 11:57:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:58:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 11:58:13 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 14 11:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 11:58:25 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 14 11:58:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 11:58:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 12:00:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 12:00:53 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 12:08:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 14 12:08:34 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 12:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 12:08:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 12:10:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 12:10:54 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 12:12:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 12:12:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 12:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28a9c2d400, cur 1563131583 expire 1563131433 last 1563131356 Jul 14 12:13:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 14 12:18:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 12:18:46 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 12:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 12:19:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 12:21:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 12:21:28 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 14 12:29:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 12:29:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 12:31:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 12:31:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 12:31:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 12:31:41 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 14 12:34:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 12:36:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 12:38:03 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563133076/real 1563133076] req@ffff8f08816fb300 x1636733016828064/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563133083 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 12:39:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 12:39:30 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 14 12:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 12:42:11 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 14 12:44:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 12:44:14 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 12:49:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 12:49:31 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 12:52:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 12:52:19 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 14 12:54:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 12:54:25 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 14 12:59:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 12:59:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 13:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 13:02:23 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 13:04:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 13:04:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 13:06:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 13:09:55 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 13:12:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 13:12:57 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 14 13:15:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 13:15:03 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 13:20:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 13:20:39 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 13:23:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 13:23:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 13:25:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 13:25:03 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 14 13:30:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 13:30:48 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 13:33:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:33:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 13:33:26 fir-md1-s1 kernel: Lustre: Skipped 130 previous similar messages Jul 14 13:34:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:35:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 13:35:35 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 14 13:40:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 13:40:54 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 13:41:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:42:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 13:44:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 13:44:11 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 13:46:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 13:46:42 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 13:51:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 13:51:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 14 13:54:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 13:54:19 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 13:56:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 13:56:43 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 14 14:01:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 14:01:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 14:04:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 14:04:46 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 14 14:06:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:07:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 14:07:40 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 14 14:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 14:11:43 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 14:14:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 14:14:47 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 14:17:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:17:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 14:17:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 14 14:21:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 14:21:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 14:23:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:24:24 fir-md1-s1 kernel: Lustre: 10502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563139457/real 1563139457] req@ffff8f0810f4ad00 x1636733071020672/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563139464 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 14:24:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 14:24:48 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 14 14:30:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 14:30:50 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 14:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 14:32:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 14:34:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:35:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Jul 14 14:35:01 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 14 14:35:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.68@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:35:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 14:35:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:41:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:41:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 14:41:30 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 14:42:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 14:42:28 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 14:43:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:43:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 14:45:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 14:45:04 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 14 14:51:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 14:51:37 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 14:52:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 14:52:38 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 14:55:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 14:55:33 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 14 15:02:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 15:02:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 15:02:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 15:02:38 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 15:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 15:05:36 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 14 15:06:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:12:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 15:12:41 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 15:13:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 15:13:29 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 15:16:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 15:16:27 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 15:20:32 fir-md1-s1 kernel: Lustre: 23672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563142825/real 1563142825] req@ffff8f0a68e7e600 x1636733088896816/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563142832 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 15:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 15:22:57 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 14 15:24:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 15:24:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 14 15:26:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 15:26:30 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 14 15:30:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 15:33:24 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 15:36:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 15:36:57 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 15:37:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 14 15:37:22 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 15:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 15:43:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 15:47:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 15:47:02 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 15:48:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 15:48:13 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 15:50:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:52:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:53:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:53:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:54:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:54:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 15:54:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 15:57:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 15:57:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 15:58:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 15:58:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 15:58:40 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 14 15:59:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:00:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:01:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:03:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:03:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 14 16:04:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 16:04:14 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 16:07:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 16:07:11 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 14 16:09:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:09:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 16:09:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 16:09:02 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 14 16:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 16:14:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 16:17:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 16:17:12 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 14 16:19:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 16:19:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:19:35 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 14 16:19:35 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 14 16:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 16:25:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 16:27:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 16:27:12 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 16:29:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 16:29:46 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 14 16:35:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 16:35:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 16:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 16:37:13 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 16:39:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 16:39:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 14 16:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 16:45:23 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 16:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 16:47:16 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 14 16:50:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:50:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 16:51:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 16:51:20 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 14 16:52:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 16:52:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 16:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 16:56:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 16:57:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 16:57:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 14 17:01:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 17:01:30 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 14 17:06:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 17:06:19 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 17:07:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 17:07:26 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 14 17:09:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:11:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 17:11:36 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 14 17:16:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 17:16:24 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 14 17:17:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 17:17:40 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 14 17:21:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 17:21:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 17:25:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 17:26:42 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 14 17:28:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 17:28:00 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 14 17:28:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:28:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 17:29:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:29:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 17:31:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:33:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 17:33:54 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 14 17:34:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:36:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 17:36:53 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 17:38:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 17:38:03 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 17:43:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 17:43:55 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 17:47:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 17:47:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 17:47:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 17:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 17:48:06 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 14 17:54:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 17:54:39 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 17:58:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 17:58:10 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 17:58:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 17:58:10 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 14 18:04:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 18:04:45 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 18:05:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 18:08:43 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 14 18:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 18:08:43 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 14 18:12:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:13:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:16:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 18:16:02 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 14 18:17:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:18:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 18:18:45 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 14 18:18:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:18:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 18:18:55 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 14 18:26:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 18:26:08 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 18:28:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 18:28:46 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 14 18:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 18:29:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 14 18:33:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:33:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff2293800, cur 1563154413 expire 1563154263 last 1563154186 Jul 14 18:35:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:36:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 18:36:46 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 18:37:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1f744ac0-b202-c1be-34d8-15a9e9bcd8e8 (at 10.8.25.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250593ec00, cur 1563154664 expire 1563154514 last 1563154437 Jul 14 18:39:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 18:39:00 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 14 18:39:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 18:40:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 14 18:45:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:46:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:46:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 18:46:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 14 18:47:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 18:49:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 18:49:05 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 14 18:50:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 18:50:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 18:57:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 18:57:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 18:59:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 18:59:06 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 14 19:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 19:01:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 14 19:03:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:07:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:07:45 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 19:09:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 19:09:16 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 14 19:12:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 19:12:24 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 14 19:17:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:17:50 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 14 19:19:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 19:19:17 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 14 19:23:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 19:23:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 14 19:28:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:28:03 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 19:28:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:29:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 19:29:35 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 14 19:33:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 19:33:40 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 14 19:38:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:38:03 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 14 19:39:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 19:39:37 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 14 19:43:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 19:43:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 19:48:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:48:14 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 19:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 19:49:42 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 14 19:52:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:54:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 19:54:10 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 19:55:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:57:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:58:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 19:59:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 19:59:14 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 19:59:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 19:59:44 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 20:01:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:02:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:04:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:04:25 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 20:08:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:09:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 20:09:45 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 14 20:12:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 20:12:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 14 20:14:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:14:36 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 20:14:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:18:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:19:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 20:19:48 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 20:24:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 14 20:24:10 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 14 20:24:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:24:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:24:47 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 14 20:28:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e3a95d5e-2945-1bb1-dd2c-d936b00a965b (at 10.8.10.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3c25000, cur 1563161301 expire 1563161151 last 1563161074 Jul 14 20:28:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 20:30:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 20:30:31 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 14 20:31:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c55aa85e-9bb5-05f7-715c-4f84fb1a4539 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fca09b800, cur 1563161493 expire 1563161343 last 1563161266 Jul 14 20:31:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 20:31:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:31:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 20:35:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:35:00 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 20:35:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 20:35:14 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 14 20:40:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 20:40:40 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 14 20:41:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:41:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 20:45:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:45:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 14 20:45:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 20:45:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 20:50:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 20:50:51 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 14 20:53:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 20:53:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 14 20:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 20:55:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 21:00:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 21:00:34 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 21:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 21:01:06 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 14 21:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 21:05:29 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 14 21:07:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 21:07:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 14 21:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 21:11:12 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 14 21:12:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 21:12:38 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 14 21:15:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 21:15:47 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 21:19:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f13c52d4800, cur 1563164394 expire 1563164244 last 1563164167 Jul 14 21:19:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 14 21:21:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 21:21:14 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 14 21:24:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 21:24:20 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 14 21:25:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 21:25:22 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 14 21:25:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 21:25:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 21:31:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 21:31:17 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 14 21:35:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 21:35:23 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 21:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fbcb59c00, cur 1563165330 expire 1563165180 last 1563165103 Jul 14 21:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 21:36:12 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 14 21:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 14 21:41:29 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 14 21:41:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 21:41:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 14 21:45:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 21:45:27 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 21:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 21:46:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 21:51:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 21:51:41 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 14 21:52:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 21:52:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 14 21:55:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 21:55:35 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 21:56:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 21:56:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 14 22:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 22:01:43 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 14 22:03:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 22:03:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 22:05:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 14 22:05:49 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 22:06:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 22:06:44 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 14 22:11:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 22:11:49 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 14 22:16:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 22:16:12 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 14 22:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 22:16:55 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 14 22:18:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 22:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0ea72b5e-3a2b-5bb2-d7d0-9add5a9dde42 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec72a5800, cur 1563167962 expire 1563167812 last 1563167735 Jul 14 22:19:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f66bab8e-e08e-c0b5-8b49-2c1c5ad402c5 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fa82e2800, cur 1563167966 expire 1563167816 last 1563167739 Jul 14 22:19:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 14 22:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 22:21:55 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 14 22:26:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 22:26:39 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 14 22:26:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 22:26:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 22:31:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 22:31:59 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 14 22:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 14 22:37:01 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 14 22:39:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 22:39:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 14 22:40:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 22:40:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 14 22:42:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 22:42:00 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 14 22:44:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 22:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 22:47:35 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 14 22:49:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 22:49:37 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 14 22:51:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 22:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 22:52:02 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 14 22:58:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 22:58:02 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 14 22:59:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 22:59:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 14 23:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 23:02:13 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 14 23:06:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 23:06:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 23:08:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 14 23:08:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 23:10:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 14 23:10:17 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 14 23:12:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 23:12:16 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 14 23:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 23:18:31 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 14 23:19:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 23:19:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 14 23:20:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 23:20:32 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 14 23:22:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 14 23:22:18 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 14 23:28:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 23:28:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 14 23:32:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 23:32:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 14 23:32:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 14 23:32:27 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 14 23:35:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 23:35:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 14 23:38:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 14 23:38:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 14 23:42:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 23:42:21 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 14 23:42:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 14 23:42:53 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 14 23:48:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 23:48:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 14 23:48:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 14 23:48:51 fir-md1-s1 kernel: Lustre: Skipped 3715 previous similar messages Jul 14 23:52:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 14 23:52:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 14 23:53:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 14 23:53:11 fir-md1-s1 kernel: Lustre: Skipped 3741 previous similar messages Jul 14 23:53:57 fir-md1-s1 kernel: Lustre: 23706:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563173630/real 1563173630] req@ffff8f1d9c98c200 x1636733261436736/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563173637 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 14 23:59:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 14 23:59:05 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 14 23:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 14 23:59:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 00:03:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 00:03:26 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 15 00:03:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 00:03:29 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 15 00:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 00:09:44 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 15 00:10:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 00:10:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 00:13:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 00:13:59 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 15 00:14:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 00:14:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 00:20:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 00:20:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 00:21:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 00:21:09 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 15 00:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 00:24:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 15 00:25:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 00:25:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 00:30:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 00:30:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 00:31:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 00:31:21 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 15 00:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 00:34:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 15 00:35:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 00:35:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 00:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8a2e0e99-b7e2-2b0e-9dbb-18a669bd784a (at 10.9.105.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f350c6ed000, cur 1563176222 expire 1563176072 last 1563175995 Jul 15 00:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 00:40:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 00:42:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 00:42:25 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 15 00:44:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 00:44:19 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 00:46:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 00:46:10 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 00:50:12 fir-md1-s1 kernel: Lustre: 23561:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563177005/real 1563177005] req@ffff8f0b15b83c00 x1636733279194336/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563177012 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 00:50:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 00:50:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 00:52:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 00:52:50 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 15 00:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 00:54:50 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 00:56:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 00:56:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 01:00:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 01:00:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 01:02:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:02:52 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 15 01:05:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 01:05:17 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 15 01:07:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 01:07:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 01:10:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 01:10:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 01:12:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:12:56 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 15 01:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 01:15:39 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 15 01:17:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 01:17:57 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 15 01:20:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 01:20:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 01:23:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:23:19 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 15 01:25:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 01:25:43 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 15 01:28:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 01:28:03 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 15 01:30:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 01:30:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 01:33:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:33:57 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 15 01:36:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 01:36:03 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 01:38:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 01:38:46 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 15 01:41:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 01:41:01 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 01:44:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:44:48 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 15 01:46:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 01:46:22 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 01:50:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 01:50:49 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 01:51:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 01:51:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 01:54:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 01:54:53 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 15 01:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 01:56:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 15 01:56:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dedc57800, cur 1563180994 expire 1563180844 last 1563180767 Jul 15 01:56:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 02:00:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 02:00:53 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 15 02:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 02:01:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 02:05:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:05:09 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 15 02:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 02:06:29 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 15 02:12:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 02:12:26 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 15 02:12:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 02:12:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 02:15:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:15:41 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 15 02:16:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 02:16:29 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 15 02:22:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 02:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 02:22:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 02:22:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 02:26:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:26:16 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 15 02:26:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 02:26:43 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 02:33:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 02:33:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 02:33:07 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 02:33:07 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 02:37:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 02:37:31 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 02:37:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:37:46 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 15 02:43:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 02:43:28 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 02:43:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 02:43:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 15 02:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 02:47:34 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 02:48:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:48:23 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 15 02:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 02:53:33 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 15 02:53:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 02:53:56 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 15 02:57:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 02:57:57 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 15 02:58:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 02:58:41 fir-md1-s1 kernel: LustreError: Skipped 15 previous similar messages Jul 15 03:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 03:03:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 03:05:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 03:05:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 15 03:08:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 03:08:03 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 15 03:13:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 03:13:55 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 03:15:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 03:15:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 03:15:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 03:15:49 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 15 03:18:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 03:18:11 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 15 03:24:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 03:24:00 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 03:26:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 03:26:50 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 03:27:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 03:27:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 03:28:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 03:28:11 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 15 03:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 03:34:00 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 03:37:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 03:37:02 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 15 03:37:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 03:37:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 03:38:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 03:38:16 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 15 03:44:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 03:44:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 15 03:47:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 03:47:55 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 03:48:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 03:48:17 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 03:48:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 03:48:53 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 15 03:54:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 03:54:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 03:58:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 03:58:30 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 03:58:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 03:58:40 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 15 04:04:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 04:04:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 04:06:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 23a2ad6b-df40-bb2e-b9a6-1311fa9a1b7e (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f330f01ac00, cur 1563188795 expire 1563188645 last 1563188568 Jul 15 04:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 04:08:33 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 04:10:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:10:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 04:10:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 04:10:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 04:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 04:14:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 04:16:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 04:18:45 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 15 04:20:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:20:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 04:20:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 04:20:55 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 15 04:24:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 04:24:49 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 04:26:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:26:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 04:28:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 04:28:48 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 04:31:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 04:31:01 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 15 04:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 04:35:01 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 15 04:36:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:36:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 04:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 04:39:04 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 15 04:41:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 04:41:07 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 04:45:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 04:45:01 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 04:48:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 04:48:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 04:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 04:49:06 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 15 04:52:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 04:52:10 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 15 04:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 04:55:27 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 04:59:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 04:59:16 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 15 05:02:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:02:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 05:02:11 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 15 05:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 05:05:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 05:09:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 05:09:16 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 15 05:13:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:13:36 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 15 05:16:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 05:16:12 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 05:19:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 05:19:24 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 15 05:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fe090e400, cur 1563193423 expire 1563193273 last 1563193196 Jul 15 05:23:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 05:23:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:23:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 05:26:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 05:26:58 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 05:29:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 05:29:32 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 15 05:33:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 05:33:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 05:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 05:37:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 05:37:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:37:08 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 05:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 05:39:54 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 15 05:40:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 05:40:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 05:46:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 05:47:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:47:10 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 05:47:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 05:47:11 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 15 05:50:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 05:50:05 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 15 05:52:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 05:57:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 05:57:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 05:58:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 05:58:13 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 15 06:00:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 06:00:12 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 15 06:07:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 06:07:44 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 06:09:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 06:09:56 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 15 06:10:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 06:10:20 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 15 06:16:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 06:16:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 06:18:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 06:18:15 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 06:20:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 15 06:20:04 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 15 06:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 06:20:26 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 15 06:26:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 06:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 06:28:58 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 15 06:30:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 06:30:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 06:30:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ecfc1c400, cur 1563197409 expire 1563197259 last 1563197182 Jul 15 06:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 06:30:35 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 06:34:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 06:39:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 06:39:23 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 15 06:40:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 06:40:06 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 15 06:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 06:40:36 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 15 06:41:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 06:41:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 06:49:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 06:49:24 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 06:50:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 06:50:17 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 15 06:51:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 06:51:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 06:57:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 06:57:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 06:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 06:59:28 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 07:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 07:02:24 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 15 07:05:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4449627400, cur 1563199550 expire 1563199400 last 1563199323 Jul 15 07:06:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 07:06:03 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 15 07:07:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 07:07:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 07:09:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 07:09:45 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 07:12:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 07:12:53 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 15 07:16:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 07:16:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 15 07:18:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 07:18:40 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 15 07:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 07:20:54 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 07:23:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 07:23:02 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 07:26:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 07:26:46 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 07:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 07:31:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 07:33:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 07:33:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 07:33:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 07:33:04 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 15 07:37:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 07:37:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 07:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 07:41:12 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 15 07:43:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 07:43:51 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 15 07:47:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 07:47:50 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 07:49:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 07:49:41 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 07:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 07:51:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 07:54:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 07:54:08 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 15 07:58:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 07:58:23 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 15 08:00:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 08:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 08:01:33 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 08:04:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 08:04:20 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 15 08:10:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 08:10:15 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 08:11:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 08:11:35 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 15 08:14:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 08:14:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 08:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 08:14:32 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 15 08:21:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 08:21:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 08:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 08:21:44 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 08:24:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 08:24:50 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 15 08:30:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 08:30:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 08:31:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 08:31:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 08:33:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 08:33:42 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 15 08:34:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 08:34:56 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 08:35:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de7e58000, cur 1563204925 expire 1563204775 last 1563204698 Jul 15 08:42:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 08:42:16 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 15 08:45:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 08:45:14 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 08:45:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 08:45:55 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 15 08:51:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 08:51:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 08:52:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 08:53:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 08:53:13 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 15 08:56:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 08:56:01 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 08:56:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 08:56:01 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 09:03:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 09:03:38 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 15 09:05:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 09:06:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 09:06:02 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 09:06:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 09:06:02 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 09:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 09:14:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 09:16:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 09:16:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 09:17:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 09:17:35 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 15 09:18:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 09:18:29 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 09:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 09:24:25 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 15 09:27:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 09:27:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 09:27:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 09:27:46 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 09:28:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 09:28:48 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 09:34:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 09:34:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 09:37:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 09:37:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 09:37:58 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 09:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 09:38:52 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 09:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 09:44:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 09:48:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 09:48:09 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 09:49:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 09:49:25 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 09:55:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 09:55:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 09:58:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 09:58:13 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 09:59:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 09:59:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 10:04:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:04:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 10:05:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 10:05:28 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 10:05:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:05:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 10:08:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:08:24 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 15 10:11:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 10:11:18 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 15 10:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 10:15:49 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 15 10:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:18:43 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 15 10:21:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 10:21:25 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 10:26:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 10:26:27 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 10:28:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3671f01000, cur 1563211693 expire 1563211543 last 1563211466 Jul 15 10:28:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:28:47 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 15 10:33:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 15 10:33:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 10:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 10:36:31 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 10:38:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:38:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 10:38:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:38:51 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 10:41:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:42:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:43:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 10:43:09 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 15 10:44:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:44:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 10:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 10:46:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 10:48:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:48:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 10:49:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:49:09 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 15 10:53:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 10:53:33 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 15 10:55:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 10:55:41 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 10:56:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 10:56:42 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 10:59:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 10:59:11 fir-md1-s1 kernel: Lustre: Skipped 126 previous similar messages Jul 15 11:03:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:03:33 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 15 11:07:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 11:07:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 11:09:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 11:09:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 15 11:09:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 11:09:35 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 11:13:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:13:33 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 15 11:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 11:17:39 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 11:19:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 11:19:43 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 15 11:24:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:24:02 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 15 11:24:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 11:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eef76c71-6455-3c9c-c2bd-e13c2b066def (at 10.8.30.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f15d3c00, cur 1563215104 expire 1563214954 last 1563214877 Jul 15 11:27:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 11:27:52 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 15 11:30:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 11:30:02 fir-md1-s1 kernel: Lustre: Skipped 141 previous similar messages Jul 15 11:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:34:03 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 11:35:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 11:35:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 11:37:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 11:37:55 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 11:40:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 11:40:06 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 15 11:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5048051f-aacc-10b9-d9da-eb27fb049919 (at 10.9.104.25@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f8ad4800, cur 1563216179 expire 1563216029 last 1563215952 Jul 15 11:42:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 15 11:44:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:44:04 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 15 11:44:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 018b4088-9100-7f5b-2709-38dd7f461ac7 (at 10.8.8.29@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff8f2501a69400, cur 1563216255 expire 1563216105 last 1563216084 Jul 15 11:44:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 11:45:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 887b140e-8cff-f857-a016-9d4798eb3a24 (at 10.8.8.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1505cb2c00, cur 1563216311 expire 1563216161 last 1563216084 Jul 15 11:45:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 15 11:46:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6e5f9da5-7e02-257b-d9c3-9ff6edd45e41 (at 10.9.104.25@o2ib4) in 180 seconds. I think it's dead, and I am evicting it. exp ffff8f364fef9400, cur 1563216387 expire 1563216237 last 1563216207 Jul 15 11:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8a6403b0-19b9-9d96-c101-52e3001fff6c (at 10.9.104.25@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d0b231800, cur 1563216437 expire 1563216287 last 1563216210 Jul 15 11:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 11:48:02 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 15 11:50:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 11:50:12 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 15 11:54:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 11:54:13 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 11:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 91e21a4a-f1ae-e50e-7e41-21aa1b29cf61 (at 10.9.113.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cf1ce8800, cur 1563216885 expire 1563216735 last 1563216658 Jul 15 11:54:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 15 11:55:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 11:58:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 11:58:09 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 12:00:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 12:00:18 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 15 12:02:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33678df3-cbf6-7b66-f13e-728347cfb474 (at 10.9.113.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fa331000, cur 1563217341 expire 1563217191 last 1563217114 Jul 15 12:02:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 12:04:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 12:04:18 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 15 12:07:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 12:07:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 12:08:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 12:08:16 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 12:10:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 12:10:21 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 15 12:14:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 12:14:31 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 15 12:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 12:18:17 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 12:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 12:20:21 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 15 12:24:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 12:24:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 12:27:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 12:27:17 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 12:28:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 12:28:19 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 12:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 12:30:34 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 15 12:37:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 12:37:23 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 12:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 12:38:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 12:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 12:40:36 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 15 12:42:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 12:47:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 12:47:23 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 15 12:48:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 12:48:56 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 12:50:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 12:50:36 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 15 12:56:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 12:56:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 12:58:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 12:58:10 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 12:59:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 12:59:13 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 13:00:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 13:00:54 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 15 13:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15065bb800, cur 1563221136 expire 1563220986 last 1563220909 Jul 15 13:05:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 13:08:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 13:08:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 13:08:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 13:08:33 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 15 13:09:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 13:09:16 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 13:11:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 13:11:04 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 15 13:16:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f281ff36c00, cur 1563221794 expire 1563221644 last 1563221567 Jul 15 13:18:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 13:18:33 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 15 13:19:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 13:19:17 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 13:19:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 13:19:38 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 13:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 13:21:21 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Jul 15 13:29:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 13:29:25 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 13:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 13:29:41 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 15 13:30:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 13:30:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 13:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 13:31:27 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 15 13:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 13:39:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 13:40:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 13:40:21 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 15 13:41:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 13:41:31 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 15 13:45:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 13:45:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 13:49:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 13:49:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 13:51:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 13:51:42 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 15 13:51:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 13:51:48 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 15 13:55:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 13:55:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 14:00:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 14:00:18 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 15 14:01:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 14:01:50 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 15 14:02:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:02:25 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 15 14:10:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 14:10:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 14:11:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 14:11:54 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 15 14:12:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:12:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 14:20:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 14:20:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 14:20:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 14:20:52 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 14:22:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 14:22:21 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 15 14:22:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:22:50 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 15 14:30:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 14:30:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 14:30:55 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 14:32:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 14:32:27 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 15 14:34:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:34:24 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 14:39:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 14:39:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 14:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 14:41:03 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 14:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 14:42:31 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 15 14:46:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:46:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 15 14:50:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 14:50:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 14:51:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 14:51:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 14:52:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 14:52:31 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 14:56:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 14:56:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 15:01:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 15:01:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 15:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 15:02:43 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 15 15:04:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 15:04:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 15:07:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 15:07:17 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 15:11:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 15:11:33 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 15:13:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 15:13:01 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 15 15:17:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 15:17:29 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 15:18:38 fir-md1-s1 kernel: Lustre: 23697:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563229111/real 0] req@ffff8f360672dd00 x1636733705317840/t0(0) o104->fir-MDT0002@10.8.7.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229118 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 15:18:40 fir-md1-s1 kernel: LustreError: 46522:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f34c8aa5850 x1638236447926464/t0(0) o4->e53089e0-0379-2982-632f-afbd57f75e4f@10.8.2.32@o2ib6:23/0 lens 504/448 e 1 to 0 dl 1563229133 ref 1 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:41 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563229114/real 0] req@ffff8f0db4e5ad00 x1636733705353616/t0(0) o106->fir-MDT0000@10.8.29.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563229121 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 15:18:41 fir-md1-s1 kernel: Lustre: 23598:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 15 15:18:42 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:18:43 fir-md1-s1 kernel: Lustre: 20463:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563229116/real 0] req@ffff8f3177b75400 x1636733705365360/t0(0) o104->fir-MDT0000@10.8.28.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229123 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 15:18:43 fir-md1-s1 kernel: Lustre: 20463:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 33 previous similar messages Jul 15 15:18:43 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:18:43 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f184b563850 x1638886645534656/t0(0) o4->666b60d6-ed92-c98b-c78c-4bfc3f3e7231@10.8.16.2@o2ib6:23/0 lens 504/448 e 1 to 0 dl 1563229133 ref 1 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:43 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33d8ec6000 Jul 15 15:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with e53089e0-0379-2982-632f-afbd57f75e4f (at 10.8.2.32@o2ib6), client will retry: rc = -110 Jul 15 15:18:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 15 15:18:44 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bb0529c00 Jul 15 15:18:45 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:18:45 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 15 15:18:45 fir-md1-s1 kernel: Lustre: 23714:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a223a8600 x1638276954421856/t352605293928(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:20/0 lens 504/2888 e 1 to 0 dl 1563229130 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:47 fir-md1-s1 kernel: Lustre: 23733:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a223af800 x1637883407800928/t0(0) o101->aa3ee41d-cac0-6749-5220-bb62e9eebc36@10.8.28.5@o2ib6:21/0 lens 576/3264 e 1 to 0 dl 1563229131 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:48 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:18:49 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f056c851200 x1631542709475504/t352605286701(0) o36->903c51ef-2159-9907-073d-897a3f432dcf@10.9.109.11@o2ib4:24/0 lens 488/3152 e 1 to 0 dl 1563229134 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:49 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 15 15:18:51 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0db4e58000 x1635204891654560/t0(0) o101->f6ea22f6-446c-b33a-7f85-ddd4280dae8d@10.9.101.23@o2ib4:26/0 lens 576/3264 e 1 to 0 dl 1563229136 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 15:18:51 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Jul 15 15:18:52 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f2a223af800 x1637883407800928/t0(0) o101->aa3ee41d-cac0-6749-5220-bb62e9eebc36@10.8.28.5@o2ib6:21/0 lens 576/536 e 1 to 0 dl 1563229131 ref 1 fl Complete:/0/0 rc 0/0 Jul 15 15:18:52 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 53848 previous similar messages Jul 15 15:18:53 fir-md1-s1 kernel: Lustre: 20463:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563229126/real 1563229126] req@ffff8f2de4b91e00 x1636733705377328/t0(0) o104->fir-MDT0000@10.8.10.36@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229133 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 15:18:53 fir-md1-s1 kernel: Lustre: 20463:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 488 previous similar messages Jul 15 15:18:55 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f08f1135d00 x1638894534705040/t0(0) o101->70f17c05-8e9e-e3e3-0fb3-adadf2c8b10a@10.9.103.22@o2ib4:0/0 lens 480/0 e 1 to 0 dl 1563229140 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 15 15:18:55 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 357 previous similar messages Jul 15 15:19:01 fir-md1-s1 kernel: Lustre: 23598:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f08440f3600 x1634161738022496/t0(0) o101->32315fe6-6915-bd82-691a-5460d13ab6db@10.9.103.27@o2ib4:29/0 lens 480/0 e 1 to 0 dl 1563229139 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:19:02 fir-md1-s1 kernel: Lustre: 23607:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:12s); client may timeout. req@ffff8f2a223a8600 x1638276954421856/t352605293928(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:20/0 lens 504/424 e 1 to 0 dl 1563229130 ref 1 fl Complete:/0/0 rc 0/0 Jul 15 15:19:03 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f08f1131500 x1631742464335136/t0(0) o101->9101e47c-5087-9ebf-bb20-6ff2bf817bf0@10.9.101.32@o2ib4:8/0 lens 576/0 e 1 to 0 dl 1563229148 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 15 15:19:03 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 993 previous similar messages Jul 15 15:19:08 fir-md1-s1 kernel: LustreError: 46524:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f184b564c50 x1637882170889552/t0(0) o4->f7faac5e-5757-f826-f11b-7d0a6430dabe@10.8.8.27@o2ib6:14/0 lens 488/448 e 1 to 0 dl 1563229154 ref 1 fl Interpret:/0/0 rc 0/0 Jul 15 15:19:09 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:19:09 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 15 15:19:09 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24e360d200 Jul 15 15:19:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f7faac5e-5757-f826-f11b-7d0a6430dabe (at 10.8.8.27@o2ib6), client will retry: rc = -110 Jul 15 15:19:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 15 15:19:14 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 40s: evicting client at 10.8.8.18@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f09644198c0/0x5d9ee640c35e808a lrc: 3/0,0 mode: PR/PR res: [0x2c002c4ce:0x13d1f:0x0].0x0 bits 0x1b/0x0 rrc: 34 type: IBT flags: 0x60200400000020 nid: 10.8.8.18@o2ib6 remote: 0xbd8ddbfa7a81dce2 expref: 15305 pid: 23685 timeout: 2344214 lvb_type: 0 Jul 15 15:19:14 fir-md1-s1 kernel: Lustre: 23691:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:15s); client may timeout. req@ffff8f08ff828000 x1631567180206448/t0(0) o101->35fe08e4-c10b-c2c7-284d-8125b5106002@10.9.107.3@o2ib4:29/0 lens 576/0 e 1 to 0 dl 1563229139 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:19:14 fir-md1-s1 kernel: LustreError: 21452:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.22.34@o2ib6: deadline 30:5s ago req@ffff8f28d63ea100 x1631646493744000/t0(0) o101->f03aa5e8-f764-2262-c217-2e99830bfe5f@10.8.22.34@o2ib6:9/0 lens 576/0 e 0 to 0 dl 1563229149 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:19:14 fir-md1-s1 kernel: LustreError: 21452:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 36 previous similar messages Jul 15 15:19:15 fir-md1-s1 kernel: Lustre: 23691:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 122 previous similar messages Jul 15 15:19:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 40s: evicting client at 10.8.22.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f20ade33a80/0x5d9ee640c379ca6e lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 901 type: IBT flags: 0x60200400000020 nid: 10.8.22.33@o2ib6 remote: 0xc3eed59f75023b34 expref: 8 pid: 97642 timeout: 2344215 lvb_type: 0 Jul 15 15:19:17 fir-md1-s1 kernel: LustreError: 22285:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.7.27@o2ib6: deadline 30:8s ago req@ffff8f16c9304500 x1631578064405024/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:9/0 lens 584/0 e 0 to 0 dl 1563229149 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:19:17 fir-md1-s1 kernel: LustreError: 22285:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 17 previous similar messages Jul 15 15:19:19 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06ac52ad00 x1634178006941776/t0(0) o101->d82be57b-2f2b-1591-b61e-7d36849f0064@10.9.109.71@o2ib4:24/0 lens 576/0 e 1 to 0 dl 1563229164 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 15 15:19:19 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2114 previous similar messages Jul 15 15:19:43 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f2811aa0c50 x1637882170889552/t0(0) o4->f7faac5e-5757-f826-f11b-7d0a6430dabe@10.8.8.27@o2ib6:13/0 lens 488/448 e 1 to 0 dl 1563229183 ref 1 fl Interpret:/2/0 rc 0/0 Jul 15 15:19:43 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 15 previous similar messages Jul 15 15:19:51 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f056c856900 x1638088205671680/t0(0) o101->9901f7bd-3861-a1cb-77e0-01bd9d079c38@10.9.110.3@o2ib4:26/0 lens 576/0 e 0 to 0 dl 1563229196 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 15 15:19:51 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2481 previous similar messages Jul 15 15:20:01 fir-md1-s1 kernel: LustreError: 23671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229111, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f320ee8b840/0x5d9ee640c378105d lrc: 3/0,1 mode: --/CW res: [0x2c002c39f:0x28a7:0x0].0x0 bits 0x2/0x0 rrc: 501 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23671 timeout: 0 lvb_type: 0 Jul 15 15:20:01 fir-md1-s1 kernel: LustreError: 23671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 15 15:20:05 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229115, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f320e134ec0/0x5d9ee640c37ae213 lrc: 3/1,0 mode: --/PR res: [0x2c002c39f:0x28a7:0x0].0x0 bits 0x13/0x0 rrc: 501 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23704 timeout: 0 lvb_type: 0 Jul 15 15:20:05 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Jul 15 15:20:05 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:20:05 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 15 15:20:05 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2889f84600 Jul 15 15:20:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f7faac5e-5757-f826-f11b-7d0a6430dabe (at 10.8.8.27@o2ib6), client will retry: rc = -110 Jul 15 15:20:05 fir-md1-s1 kernel: Lustre: 21987:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:22s); client may timeout. req@ffff8f2811aa0c50 x1637882170889552/t0(0) o4->f7faac5e-5757-f826-f11b-7d0a6430dabe@10.8.8.27@o2ib6:13/0 lens 488/448 e 1 to 0 dl 1563229183 ref 1 fl Complete:/2/ffffffff rc -110/-1 Jul 15 15:20:05 fir-md1-s1 kernel: Lustre: 21987:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 635 previous similar messages Jul 15 15:20:06 fir-md1-s1 kernel: LustreError: 23665:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229116, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2c943a1f80/0x5d9ee640c37bcff0 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 904 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23665 timeout: 0 lvb_type: 0 Jul 15 15:20:06 fir-md1-s1 kernel: LustreError: 23665:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 91 previous similar messages Jul 15 15:20:08 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229118, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f12dfa37500/0x5d9ee640c37d6c1e lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 904 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21414 timeout: 0 lvb_type: 0 Jul 15 15:20:08 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 242 previous similar messages Jul 15 15:20:22 fir-md1-s1 kernel: LustreError: 23697:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229132, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2815b21200/0x5d9ee640c37db5c8 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 904 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23697 timeout: 0 lvb_type: 0 Jul 15 15:20:22 fir-md1-s1 kernel: LustreError: 23697:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 86 previous similar messages Jul 15 15:20:31 fir-md1-s1 kernel: LustreError: 10502:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229141, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f05fc7500/0x5d9ee640c37db6c4 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 904 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10502 timeout: 0 lvb_type: 0 Jul 15 15:20:31 fir-md1-s1 kernel: LustreError: 10502:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 15 15:20:47 fir-md1-s1 kernel: LustreError: 23672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229157, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f143c6f98c0/0x5d9ee640c37db845 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 904 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23672 timeout: 0 lvb_type: 0 Jul 15 15:20:47 fir-md1-s1 kernel: LustreError: 23672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Jul 15 15:20:55 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f12300a8600 x1631565736890448/t0(0) o101->42800284-789e-e9cc-0ebd-dbacb154f6ac@10.9.107.31@o2ib4:0/0 lens 576/0 e 0 to 0 dl 1563229260 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 15 15:20:55 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5412 previous similar messages Jul 15 15:21:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 143s: evicting client at 10.8.28.5@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f33bdfd0900/0x5d9ee640c377f265 lrc: 3/0,0 mode: PR/PR res: [0x2c002c39f:0x28a7:0x0].0x0 bits 0x13/0x0 rrc: 500 type: IBT flags: 0x60200400000020 nid: 10.8.28.5@o2ib6 remote: 0x83a6390e06f652ed expref: 663 pid: 20996 timeout: 2344221 lvb_type: 0 Jul 15 15:21:16 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:143s); client may timeout. req@ffff8f162cf57800 x1638084117200976/t352605315585(0) o101->905c028c-e587-96e1-52d7-ae94e0d5428f@10.8.7.31@o2ib6:22/0 lens 1792/1192 e 1 to 0 dl 1563229132 ref 1 fl Complete:/0/0 rc 0/0 Jul 15 15:21:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 15:21:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 15:21:16 fir-md1-s1 kernel: LustreError: 23704:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.104.63@o2ib4: deadline 30:1s ago req@ffff8f2e57d65100 x1633881576987616/t0(0) o101->ec935c16-6a63-f875-145b-2db5feba3892@10.9.104.63@o2ib4:14/0 lens 576/0 e 0 to 0 dl 1563229274 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 15 15:21:16 fir-md1-s1 kernel: LustreError: 23704:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 15 15:21:16 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5463 previous similar messages Jul 15 15:21:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2cc0bc1b-7a1f-9dab-b36c-c6206a02385d (at 10.8.20.20@o2ib6) reconnecting Jul 15 15:21:33 fir-md1-s1 kernel: Lustre: Skipped 7802 previous similar messages Jul 15 15:21:56 fir-md1-s1 kernel: LNet: Service thread pid 21416 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 15:21:56 fir-md1-s1 kernel: Pid: 21416, comm: mdt00_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:21:56 fir-md1-s1 kernel: Call Trace: Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:21:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:21:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:21:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229316.21416 Jul 15 15:21:56 fir-md1-s1 kernel: Pid: 21368, comm: mdt00_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:21:56 fir-md1-s1 kernel: Call Trace: Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:21:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:21:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:21:56 fir-md1-s1 kernel: LNet: Service thread pid 97657 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 15:21:56 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 15 15:21:56 fir-md1-s1 kernel: Pid: 97657, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:21:56 fir-md1-s1 kernel: Call Trace: Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:21:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:21:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:21:56 fir-md1-s1 kernel: Pid: 21369, comm: mdt00_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:21:56 fir-md1-s1 kernel: Call Trace: Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:21:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:21:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:21:56 fir-md1-s1 kernel: Pid: 97669, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:21:56 fir-md1-s1 kernel: Call Trace: Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:21:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:21:57 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 15 15:21:57 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 15 15:21:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 15:21:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:21:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:21:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:21:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:21:57 fir-md1-s1 kernel: LNet: Service thread pid 23560 was inactive for 200.82s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:21:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229317.10364 Jul 15 15:21:57 fir-md1-s1 kernel: LNet: Service thread pid 23749 was inactive for 200.47s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:21:57 fir-md1-s1 kernel: LNet: Skipped 182 previous similar messages Jul 15 15:21:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229318.23455 Jul 15 15:21:58 fir-md1-s1 kernel: LNet: Service thread pid 21460 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:21:58 fir-md1-s1 kernel: LNet: Skipped 130 previous similar messages Jul 15 15:21:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229319.23567 Jul 15 15:22:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229320.23717 Jul 15 15:22:12 fir-md1-s1 kernel: LNet: Service thread pid 20996 was inactive for 200.39s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:22:12 fir-md1-s1 kernel: LNet: Skipped 104 previous similar messages Jul 15 15:22:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229332.20996 Jul 15 15:22:21 fir-md1-s1 kernel: LNet: Service thread pid 23598 was inactive for 200.33s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:22:21 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 15 15:22:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229341.23598 Jul 15 15:22:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229342.23607 Jul 15 15:22:34 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:22:35 fir-md1-s1 kernel: LNet: Service thread pid 21452 was inactive for 200.13s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:22:35 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Jul 15 15:22:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229355.21452 Jul 15 15:22:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229358.23672 Jul 15 15:22:46 fir-md1-s1 kernel: LustreError: 23733:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229275, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2d8273cec0/0x5d9ee640c37de992 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23733 timeout: 0 lvb_type: 0 Jul 15 15:22:46 fir-md1-s1 kernel: LustreError: 23733:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Jul 15 15:23:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Jul 15 15:23:01 fir-md1-s1 kernel: Lustre: Skipped 15362 previous similar messages Jul 15 15:23:03 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f21538bb000 x1638894101777744/t0(0) o101->841377fb-5d3e-8b58-50de-caee09553c02@10.9.112.8@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1563229388 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 15 15:23:03 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11121 previous similar messages Jul 15 15:23:08 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:23:08 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 15 15:24:36 fir-md1-s1 kernel: LNet: Service thread pid 20732 was inactive for 200.43s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 15 15:24:36 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Jul 15 15:24:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563229476.20732 Jul 15 15:25:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26f27ada-08f0-595f-95a1-db8559ff813e (at 10.8.8.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff6647400, cur 1563229502 expire 1563229352 last 1563229275 Jul 15 15:27:19 fir-md1-s1 kernel: Lustre: 23729:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3760e6ce00 x1634126899674128/t0(0) o101->a7aad8e9-6055-f520-5dcf-5ea6b8e2ae73@10.9.104.52@o2ib4:24/0 lens 576/0 e 0 to 0 dl 1563229644 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 15 15:27:19 fir-md1-s1 kernel: Lustre: 23729:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45820 previous similar messages Jul 15 15:29:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 15:29:27 fir-md1-s1 kernel: Lustre: Skipped 4240 previous similar messages Jul 15 15:30:24 fir-md1-s1 kernel: LNet: Service thread pid 23733 completed after 548.15s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 15:30:24 fir-md1-s1 kernel: Lustre: 97649:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:558s); client may timeout. req@ffff8f1e71560300 x1631545009821280/t0(0) o101->f5f74966-59a2-6619-dc33-28e321e9f975@10.9.108.31@o2ib4:6/0 lens 576/0 e 0 to 0 dl 1563229266 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 15 15:30:24 fir-md1-s1 kernel: Lustre: 97649:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 42 previous similar messages Jul 15 15:30:24 fir-md1-s1 kernel: LustreError: 20460:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.15.6@o2ib6: deadline 100:448s ago req@ffff8f17072d6300 x1639150855008368/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1563229376 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:30:24 fir-md1-s1 kernel: LustreError: 20460:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 15 15:30:24 fir-md1-s1 kernel: LNet: Skipped 421 previous similar messages Jul 15 15:30:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 15:30:24 fir-md1-s1 kernel: LustreError: 97661:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1bb05a4800 x1636733705968336/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:30:24 fir-md1-s1 kernel: LustreError: 97661:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 15 15:30:34 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563229824/real 1563229824] req@ffff8f2fd47b0f00 x1636733705955568/t0(0) o104->fir-MDT0002@10.8.17.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229834 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 15:30:44 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563229834/real 1563229834] req@ffff8f2fd47b0f00 x1636733705955568/t0(0) o104->fir-MDT0002@10.8.17.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229844 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 15:30:44 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 15:30:53 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f362c63da00/0x5d9ee640ba1257a2 lrc: 3/0,0 mode: PR/PR res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f265dcd7a8 expref: 244418 pid: 23608 timeout: 2344913 lvb_type: 0 Jul 15 15:30:53 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 39 previous similar messages Jul 15 15:30:54 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563229844/real 1563229844] req@ffff8f2fd47b0f00 x1636733705955568/t0(0) o104->fir-MDT0002@10.8.17.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229854 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 15:30:54 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 15:31:04 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563229854/real 1563229854] req@ffff8f2fd47b0f00 x1636733705955568/t0(0) o104->fir-MDT0002@10.8.17.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563229864 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 15:31:04 fir-md1-s1 kernel: Lustre: 23627:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 23627:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.17.7@o2ib6) failed to reply to blocking AST (req@ffff8f2fd47b0f00 x1636733705955568 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f15bbdad7c0/0x5d9ee640c377d363 lrc: 4/0,0 mode: PR/PR res: [0x2c002c39f:0x28a8:0x0].0x0 bits 0x13/0x0 rrc: 1090 type: IBT flags: 0x60200400000020 nid: 10.8.17.7@o2ib6 remote: 0x995d44ac889bc5d7 expref: 287 pid: 24576 timeout: 2344943 lvb_type: 0 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.17.7@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 40s: evicting client at 10.8.17.7@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f15bbdad7c0/0x5d9ee640c377d363 lrc: 3/0,0 mode: PR/PR res: [0x2c002c39f:0x28a8:0x0].0x0 bits 0x13/0x0 rrc: 1091 type: IBT flags: 0x60200400000020 nid: 10.8.17.7@o2ib6 remote: 0x995d44ac889bc5d7 expref: 288 pid: 24576 timeout: 0 lvb_type: 0 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 15 15:31:04 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:4s); client may timeout. req@ffff8f324d363600 x1631621243475024/t0(0) o101->7904decb-1129-4831-4db2-1394d4834a08@10.9.108.47@o2ib4:0/0 lens 1768/0 e 0 to 0 dl 1563229860 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 21677:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.1.23@o2ib6: deadline 30:4s ago req@ffff8f2fbdf80c00 x1635095650204480/t0(0) o101->02f653ee-3954-8dc8-cd3c-07c80d9ed9d2@10.8.1.23@o2ib6:0/0 lens 576/0 e 0 to 0 dl 1563229860 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 21677:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 372 previous similar messages Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 23603:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f316db67200 x1636733706731744/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:31:04 fir-md1-s1 kernel: LustreError: 23603:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 15 15:31:04 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 91946 previous similar messages Jul 15 15:31:06 fir-md1-s1 kernel: LustreError: 23584:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ec5a66600 x1636733706786368/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:31:06 fir-md1-s1 kernel: LustreError: 23584:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Jul 15 15:31:08 fir-md1-s1 kernel: LustreError: 97657:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f15b7e6a700 x1636733706844992/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:31:08 fir-md1-s1 kernel: LustreError: 97657:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Jul 15 15:31:26 fir-md1-s1 kernel: LustreError: 21676:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2b17220900 x1636733707267456/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:31:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6bb1b23c-28f8-153d-8cc1-2ff0115f9167 (at 10.9.106.58@o2ib4) reconnecting Jul 15 15:31:33 fir-md1-s1 kernel: Lustre: Skipped 25644 previous similar messages Jul 15 15:31:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f240b67d100/0x5d9ee640a619cbb4 lrc: 3/0,0 mode: PR/PR res: [0x20000fb8f:0x672:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f25f98072b expref: 198956 pid: 23750 timeout: 2344953 lvb_type: 0 Jul 15 15:31:54 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563229824, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f227de20240/0x5d9ee640c39383bc lrc: 3/0,1 mode: --/PW res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24580 timeout: 0 lvb_type: 0 Jul 15 15:31:54 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 15 15:31:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f3208797500/0x5d9ee640b61ac6a8 lrc: 3/0,0 mode: PR/PR res: [0x2000297f6:0x88a:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f26418bf6f expref: 173376 pid: 21379 timeout: 2344976 lvb_type: 0 Jul 15 15:31:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Jul 15 15:33:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 15:33:05 fir-md1-s1 kernel: Lustre: Skipped 22433 previous similar messages Jul 15 15:33:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 15:33:23 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6afac91a-e1c8-0ca6-0677-8b79f37ef46e (at 10.8.17.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1675e89800, cur 1563230003 expire 1563229853 last 1563229776 Jul 15 15:33:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 15 15:33:57 fir-md1-s1 kernel: LustreError: 24580:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f16a45d3000 x1636733713971856/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 15:33:57 fir-md1-s1 kernel: LustreError: 24580:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 10 previous similar messages Jul 15 15:34:09 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Mon Jul 15 15:34:09 2019 Jul 15 15:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2ade0b9c-5691-7fbe-1d3a-8c6ce8591788 (at 10.8.17.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee4895400, cur 1563230051 expire 1563229901 last 1563229824 Jul 15 15:34:11 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Mon Jul 15 15:34:11 2019 Jul 15 15:34:26 fir-md1-s1 kernel: LNet: Service thread pid 23077 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 15:34:26 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Jul 15 15:34:26 fir-md1-s1 kernel: Pid: 23077, comm: mdt02_042 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 15:34:26 fir-md1-s1 kernel: Call Trace: Jul 15 15:34:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 15:34:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 15 15:34:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 15:34:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 15:34:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 15:34:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 15:34:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 15:34:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 15:34:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563230066.23077 Jul 15 15:34:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f31c0f9e540/0x5d9ee640b61b6f03 lrc: 3/0,0 mode: PR/PR res: [0x2000297f6:0x882:0x0].0x0 bits 0x5b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f26418e7fc expref: 16849 pid: 25680 timeout: 2345126 lvb_type: 0 Jul 15 15:34:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 10 previous similar messages Jul 15 15:34:36 fir-md1-s1 kernel: LNet: Service thread pid 23077 completed after 209.64s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 15:38:28 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Mon Jul 15 15:38:28 2019 Jul 15 15:39:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 15:39:52 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 15:41:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 15:41:33 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 15 15:43:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 15:43:06 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 15 15:50:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 15:50:54 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 15 15:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 15:51:50 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 15:52:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 15:52:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 15:53:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 15:53:15 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 15:57:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27b4ff2000, cur 1563231426 expire 1563231276 last 1563231199 Jul 15 16:01:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 16:01:01 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 16:02:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 16:02:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 16:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 16:03:43 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 15 16:08:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 16:08:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 16:11:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 16:11:04 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 16:12:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 16:12:22 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 16:13:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 16:13:44 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 15 16:17:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2facc31400, cur 1563232624 expire 1563232474 last 1563232397 Jul 15 16:22:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 16:22:04 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 15 16:22:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 16:22:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 16:23:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 16:23:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 15 16:25:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 16:25:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 16:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 16:32:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 15 16:33:15 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233588/real 1563233588] req@ffff8f196fc30600 x1636733815041760/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233595 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 16:33:15 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 16:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 16:33:57 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 16:34:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 16:34:36 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 16:35:27 fir-md1-s1 kernel: Lustre: 22281:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233718/real 1563233718] req@ffff8f3751e0bf00 x1636733817098400/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233727 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 16:37:19 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233829/real 1563233829] req@ffff8f122acee300 x1636733818923280/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233839 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 16:37:29 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233839/real 1563233839] req@ffff8f122acee300 x1636733818923280/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233849 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 16:37:34 fir-md1-s1 kernel: Lustre: 23673:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ecf39c800 x1631605982370896/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:9/0 lens 376/1600 e 0 to 0 dl 1563233859 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 16:37:34 fir-md1-s1 kernel: Lustre: 23673:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 37318 previous similar messages Jul 15 16:37:39 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233849/real 1563233849] req@ffff8f122acee300 x1636733818923280/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233859 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 16:37:39 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 16:37:52 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563233862/real 1563233862] req@ffff8f2ead4e1b00 x1636733819154240/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563233872 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 16:37:52 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 16:38:02 fir-md1-s1 kernel: LustreError: 25675:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.9@o2ib6) failed to reply to blocking AST (req@ffff8f2ead4e1b00 x1636733819154240 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f0939ae2ac0/0x5d9ee640dbaee91f lrc: 4/0,0 mode: PR/PR res: [0x200029790:0x11e6:0x0].0x0 bits 0x5b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f2700d1d14 expref: 1987044 pid: 21482 timeout: 2348961 lvb_type: 0 Jul 15 16:38:02 fir-md1-s1 kernel: LustreError: 25675:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Jul 15 16:38:02 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.9@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 15 16:38:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 16:38:02 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 40s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f0939ae2ac0/0x5d9ee640dbaee91f lrc: 3/0,0 mode: PR/PR res: [0x200029790:0x11e6:0x0].0x0 bits 0x5b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f2700d1d14 expref: 1987040 pid: 21482 timeout: 0 lvb_type: 0 Jul 15 16:38:03 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1563233883 with bad export cookie 6746082457696901512 Jul 15 16:38:03 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1653 previous similar messages Jul 15 16:38:07 fir-md1-s1 kernel: LustreError: 33422:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1563233887 with bad export cookie 6746082457696901512 Jul 15 16:38:07 fir-md1-s1 kernel: LustreError: 33422:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 260 previous similar messages Jul 15 16:38:15 fir-md1-s1 kernel: LustreError: 26626:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1563233895 with bad export cookie 6746082457696901512 Jul 15 16:38:15 fir-md1-s1 kernel: LustreError: 26626:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 471 previous similar messages Jul 15 16:38:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 16:38:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 16:38:31 fir-md1-s1 kernel: LustreError: 20371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1563233911 with bad export cookie 6746082457696901512 Jul 15 16:38:31 fir-md1-s1 kernel: LustreError: 20371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 785 previous similar messages Jul 15 16:39:03 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1563233943 with bad export cookie 6746082457696901512 Jul 15 16:39:03 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1766 previous similar messages Jul 15 16:39:33 fir-md1-s1 kernel: LustreError: 25675:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563233882, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f29036a0b40/0x5d9ee640dedc755c lrc: 3/0,1 mode: --/PW res: [0x200029790:0x11e6:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 25675 timeout: 0 lvb_type: 0 Jul 15 16:39:33 fir-md1-s1 kernel: LustreError: 25675:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Jul 15 16:40:06 fir-md1-s1 kernel: LustreError: 24583:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2eeb781e00 x1636733824511856/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:40:42 fir-md1-s1 kernel: LNet: Service thread pid 25675 was inactive for 200.09s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:40:42 fir-md1-s1 kernel: Pid: 25675, comm: mdt02_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:40:42 fir-md1-s1 kernel: Call Trace: Jul 15 16:40:42 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:40:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:40:42 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 16:40:42 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:40:42 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:40:43 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 15 16:40:43 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 15 16:40:43 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:40:43 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:40:43 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:40:43 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:40:43 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:40:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234043.25675 Jul 15 16:41:51 fir-md1-s1 kernel: LustreError: 23746:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f26540f9e00 x1636733825167024/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:42:16 fir-md1-s1 kernel: Lustre: 23695:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2a7789b600 x1633727112765504/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:21/0 lens 480/568 e 0 to 0 dl 1563234141 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 16:42:16 fir-md1-s1 kernel: Lustre: 23695:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 15 16:42:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f0de6250000/0x5d9ee640dbb0e43d lrc: 3/0,0 mode: PR/PR res: [0x200029937:0x12b4:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f2700d9f0b expref: 982283 pid: 23704 timeout: 2349200 lvb_type: 0 Jul 15 16:42:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6) reconnecting Jul 15 16:42:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 15 16:43:21 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563234111, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2fb2c00900/0x5d9ee640dfc1d5c3 lrc: 3/0,1 mode: --/PW res: [0x200029937:0x12b4:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23746 timeout: 0 lvb_type: 0 Jul 15 16:43:25 fir-md1-s1 kernel: LustreError: 23695:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f276a481800 x1636733825737968/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:43:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 26ab021a-adb7-b814-3d61-a4e6dec4651f (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2654064c00, cur 1563234223 expire 1563234073 last 1563233996 Jul 15 16:43:54 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f13ec0a4140/0x5d9ee640dbb141eb lrc: 3/0,0 mode: PR/PR res: [0x200029937:0x12ae:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f2700db593 expref: 813755 pid: 97672 timeout: 2349294 lvb_type: 0 Jul 15 16:43:55 fir-md1-s1 kernel: LustreError: 20462:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1dbb27bf00 x1636733826427904/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:44:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 16:44:01 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 15 16:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26ab021a-adb7-b814-3d61-a4e6dec4651f (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a59edbc00, cur 1563234264 expire 1563234114 last 1563234037 Jul 15 16:44:49 fir-md1-s1 kernel: LNet: Service thread pid 25675 completed after 446.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:44:49 fir-md1-s1 kernel: LustreError: 21461:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1d70e83f00 x1636733827929376/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:44:55 fir-md1-s1 kernel: LustreError: 23695:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563234205, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f34807f6780/0x5d9ee640dff5c6a5 lrc: 3/0,1 mode: --/PW res: [0x200029937:0x12ae:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23695 timeout: 0 lvb_type: 0 Jul 15 16:45:11 fir-md1-s1 kernel: LNet: Service thread pid 23746 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:45:11 fir-md1-s1 kernel: Pid: 23746, comm: mdt02_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:45:11 fir-md1-s1 kernel: Call Trace: Jul 15 16:45:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:45:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:45:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:45:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:45:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:45:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234311.23746 Jul 15 16:45:14 fir-md1-s1 kernel: Lustre: 24586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d2fea0f00 x1636464069256288/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1563234319 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 16:45:14 fir-md1-s1 kernel: Lustre: 24586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 15 16:45:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1243babcc0/0x5d9ee640dbaeeb41 lrc: 3/0,0 mode: PR/PR res: [0x200029790:0x11dc:0x0].0x0 bits 0x5b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f2700d1e80 expref: 681379 pid: 24578 timeout: 2349378 lvb_type: 0 Jul 15 16:45:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 15 16:45:25 fir-md1-s1 kernel: LustreError: 23716:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f323b6fb600 x1636733828413104/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:45:25 fir-md1-s1 kernel: LustreError: 23716:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 15 16:45:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 16:45:37 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 16:45:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f3b8986-88bc-dd5d-4c41-5670b4e69c0b (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4056914800, cur 1563234353 expire 1563234203 last 1563234126 Jul 15 16:46:19 fir-md1-s1 kernel: LustreError: 21461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563234289, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1e54b9e0c0/0x5d9ee640e0aae149 lrc: 3/0,1 mode: --/PW res: [0x200029790:0x11dc:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21461 timeout: 0 lvb_type: 0 Jul 15 16:46:19 fir-md1-s1 kernel: LustreError: 21461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 15 16:47:16 fir-md1-s1 kernel: LNet: Service thread pid 20462 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:47:16 fir-md1-s1 kernel: Pid: 20462, comm: mdt01_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:47:16 fir-md1-s1 kernel: Call Trace: Jul 15 16:47:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:47:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:47:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:47:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:47:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:47:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234436.20462 Jul 15 16:48:09 fir-md1-s1 kernel: LNet: Service thread pid 21461 was inactive for 200.44s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:48:09 fir-md1-s1 kernel: Pid: 21461, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:48:09 fir-md1-s1 kernel: Call Trace: Jul 15 16:48:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:48:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:48:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:48:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:48:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:48:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:48:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234490.21461 Jul 15 16:48:30 fir-md1-s1 kernel: LNet: Service thread pid 23454 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:48:30 fir-md1-s1 kernel: Pid: 23454, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:48:30 fir-md1-s1 kernel: Call Trace: Jul 15 16:48:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:48:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:48:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:48:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:48:31 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:48:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:48:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:48:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:48:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:48:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:48:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:48:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234511.23454 Jul 15 16:48:45 fir-md1-s1 kernel: LNet: Service thread pid 23716 was inactive for 200.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 15 16:48:45 fir-md1-s1 kernel: Pid: 23716, comm: mdt02_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 15 16:48:45 fir-md1-s1 kernel: Call Trace: Jul 15 16:48:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 15 16:48:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 15 16:48:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 15 16:48:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 15 16:48:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 15 16:48:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563234525.23716 Jul 15 16:49:09 fir-md1-s1 kernel: LNet: Service thread pid 21461 completed after 259.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:49:20 fir-md1-s1 kernel: LNet: Service thread pid 23454 completed after 249.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:49:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 16:49:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 16:50:42 fir-md1-s1 kernel: LustreError: 23700:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ec84dd100 x1636733830796208/t0(0) o104->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 15 16:51:06 fir-md1-s1 kernel: LNet: Service thread pid 23746 completed after 554.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:51:07 fir-md1-s1 kernel: Lustre: 23695:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2eadbb0600 x1633736980918880/t0(0) o101->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:12/0 lens 480/568 e 0 to 0 dl 1563234672 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 16:51:07 fir-md1-s1 kernel: Lustre: 23695:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 15 16:51:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f081e79e0c0/0x5d9ee640dd8a0dfd lrc: 3/0,0 mode: PW/PW res: [0x200029dbd:0x6:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.9@o2ib6 remote: 0x243de7f270894fc3 expref: 231285 pid: 21415 timeout: 2349731 lvb_type: 0 Jul 15 16:51:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 15 16:52:12 fir-md1-s1 kernel: LustreError: 23700:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563234642, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2df5783f00/0x5d9ee640e202344d lrc: 3/1,0 mode: --/PR res: [0x200029dbd:0x6:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23700 timeout: 0 lvb_type: 0 Jul 15 16:52:12 fir-md1-s1 kernel: LustreError: 23700:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jul 15 16:53:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 545f12c1-4799-a254-b9c4-f75f43e1bc5b (at 10.8.27.23@o2ib6) reconnecting Jul 15 16:53:10 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 15 16:53:12 fir-md1-s1 kernel: LNet: Service thread pid 20462 completed after 557.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:54:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 16:54:05 fir-md1-s1 kernel: Lustre: Skipped 146 previous similar messages Jul 15 16:54:06 fir-md1-s1 kernel: LNet: Service thread pid 23716 completed after 521.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 15 16:55:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 16:55:38 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 15 17:03:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 17:03:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 17:04:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 17:04:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 15 17:05:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 17:05:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 17:06:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 17:06:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 17:13:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 17:13:55 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 17:14:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 17:14:27 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 15 17:16:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 17:16:31 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 15 17:23:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 17:23:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 17:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 17:24:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 17:24:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 17:24:29 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 15 17:27:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 17:27:26 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 17:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 17:34:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 17:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 17:34:53 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 15 17:37:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 17:37:56 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 15 17:38:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 17:38:44 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 17:42:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f287b120800, cur 1563237766 expire 1563237616 last 1563237539 Jul 15 17:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 17:44:53 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 17:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 17:44:53 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 15 17:47:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 17:47:59 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 17:51:05 fir-md1-s1 kernel: Lustre: 22281:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563238258/real 1563238258] req@ffff8f2308cf6f00 x1636733860906832/t0(0) o104->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563238265 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 17:51:05 fir-md1-s1 kernel: Lustre: 22281:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 15 17:51:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 17:51:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 17:54:18 fir-md1-s1 kernel: Lustre: 23628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563238451/real 1563238451] req@ffff8f372502e300 x1636733862725472/t0(0) o104->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563238458 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 17:54:25 fir-md1-s1 kernel: Lustre: 23628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563238458/real 1563238458] req@ffff8f372502e300 x1636733862725472/t0(0) o104->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563238465 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 17:54:35 fir-md1-s1 kernel: Lustre: 24583:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563238468/real 1563238468] req@ffff8f2ddda93f00 x1636733862784896/t0(0) o104->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563238475 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 17:54:35 fir-md1-s1 kernel: Lustre: 24583:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 17:54:36 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f233a70f800 x1638241512296464/t413336594610(0) o36->b74b4b66-65f0-f951-331c-463b7f96e033@10.9.0.62@o2ib4:11/0 lens 488/3152 e 1 to 0 dl 1563238481 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 17:54:36 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 15 17:54:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 17:54:54 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 15 17:54:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 17:54:58 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 15 17:58:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 17:58:08 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 17:58:49 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563238721/real 1563238721] req@ffff8f167e014200 x1636733864394464/t0(0) o104->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563238729 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 18:02:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 18:02:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 18:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 18:05:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 18:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 18:05:10 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 18:08:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 18:08:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 18:13:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af0ab3000, cur 1563239630 expire 1563239480 last 1563239403 Jul 15 18:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 18:15:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 15 18:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 18:15:31 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 18:19:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 15 18:19:16 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 15 18:25:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 18:25:36 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 15 18:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 18:26:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 18:29:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 18:29:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 18:32:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 18:32:50 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 18:33:01 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 18:35:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 18:35:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 18:35:43 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 15 18:36:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 18:36:35 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 18:39:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 18:43:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 18:43:54 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 15 18:46:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 18:46:01 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 15 18:46:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 18:46:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 18:53:55 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 15 18:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 18:54:53 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 15 18:55:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 18:56:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 18:56:07 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 15 18:56:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 18:56:49 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 15 19:01:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:03:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:03:55 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 19:04:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 19:04:59 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 15 19:06:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 19:06:15 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 15 19:06:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 19:06:50 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 15 19:08:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:08:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 19:14:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:14:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 19:15:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 19:15:22 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 19:16:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 19:16:18 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 15 19:16:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 19:16:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 19:25:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 15 19:25:29 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 15 19:26:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 19:26:34 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 15 19:26:34 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 19:27:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 19:27:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 15 19:31:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:31:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 19:35:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 19:35:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 15 19:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 19:36:36 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 15 19:36:38 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 19:36:45 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 15 19:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 19:37:13 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 19:43:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 19:43:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 19:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 19:46:40 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 19:47:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 19:47:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 19:47:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 19:47:49 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 15 19:56:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 19:56:42 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 15 19:58:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 19:58:01 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 19:58:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 19:58:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 20:01:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 20:01:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 20:06:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 20:06:55 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 15 20:08:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 20:08:21 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 15 20:08:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 20:08:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 15 20:17:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 20:17:08 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 15 20:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 20:18:31 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 20:20:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 15 20:20:40 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 20:26:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 20:26:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 20:27:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 20:27:38 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 15 20:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 20:29:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 20:31:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 20:31:38 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 15 20:32:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 20:32:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 20:37:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 20:37:41 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 15 20:39:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 20:39:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 15 20:42:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 20:42:44 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 15 20:45:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 20:45:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 20:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 15 20:48:02 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 15 20:49:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 15 20:49:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 20:53:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 20:53:07 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 20:53:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 20:53:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 20:58:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 20:58:03 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 15 20:59:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 15 20:59:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 21:03:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 21:03:24 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 15 21:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 21:08:10 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 21:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 21:10:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 15 21:10:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 21:10:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 21:10:47 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563250240/real 1563250240] req@ffff8f230a170f00 x1636733966598672/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563250247 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 15 21:10:54 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563250247/real 1563250247] req@ffff8f230a170f00 x1636733966598672/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563250254 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 21:10:55 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1667798c00 x1638880242767984/t413338207064(0) o36->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:0/0 lens 488/3152 e 1 to 0 dl 1563250260 ref 2 fl Interpret:/0/0 rc 0/0 Jul 15 21:11:08 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563250261/real 1563250261] req@ffff8f230a170f00 x1636733966598672/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563250268 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 21:11:08 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 15 21:11:29 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563250282/real 1563250282] req@ffff8f230a170f00 x1636733966598672/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563250289 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 21:11:29 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 15 21:12:04 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563250317/real 1563250317] req@ffff8f230a170f00 x1636733966598672/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563250324 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 15 21:12:04 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 15 21:12:25 fir-md1-s1 kernel: LustreError: 97664:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from blocking AST (req@ffff8f230a170f00 x1636733966598672 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1edf0d2640/0x5d9ee641304f91e5 lrc: 4/0,0 mode: PR/PR res: [0x200021809:0xa7bd:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x82aca788efed803b expref: 32 pid: 22004 timeout: 2365554 lvb_type: 0 Jul 15 21:12:25 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Jul 15 21:12:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1edf0d2640/0x5d9ee641304f91e5 lrc: 3/0,0 mode: PR/PR res: [0x200021809:0xa7bd:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x82aca788efed803b expref: 33 pid: 22004 timeout: 0 lvb_type: 0 Jul 15 21:12:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client adf35582-4d20-3e77-c285-74f26ce3ea8c (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f3a39c800, cur 1563250373 expire 1563250223 last 1563250146 Jul 15 21:13:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 21:13:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 21:18:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 21:18:19 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 15 21:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 21:20:24 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 21:20:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 21:20:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 15 21:25:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 21:25:46 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 15 21:28:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 21:28:28 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 21:30:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 21:30:40 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 15 21:35:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 21:35:53 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 21:38:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 21:38:34 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 21:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 21:40:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 21:41:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 21:41:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 21:46:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 21:46:14 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 15 21:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b52ebac00, cur 1563252495 expire 1563252345 last 1563252268 Jul 15 21:48:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 15 21:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 21:48:40 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 15 21:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 21:50:52 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 21:52:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 21:55:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 21:55:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 21:56:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 21:56:20 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 15 21:58:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 21:58:41 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 15 22:01:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 22:01:01 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 22:06:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 22:06:29 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 15 22:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 22:08:43 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 15 22:09:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 22:11:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 22:11:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 22:17:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 22:17:26 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 15 22:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 15 22:18:48 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 22:21:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 22:21:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 15 22:24:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 22:24:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 15 22:27:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 22:27:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 15 22:28:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 22:28:53 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 15 22:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 22:31:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 22:36:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 22:36:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 22:37:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 22:37:46 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 15 22:38:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 22:38:55 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 15 22:41:44 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 15 22:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 22:41:58 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 15 22:48:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 22:48:09 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 22:49:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 22:49:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 22:49:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 22:49:13 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 15 22:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 22:51:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 22:58:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 22:58:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 15 22:59:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 22:59:17 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 15 23:02:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 23:02:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 15 23:05:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e0c94f31-4fd8-0024-8bee-d62de96f3c21 (at 10.8.20.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252e639000, cur 1563257104 expire 1563256954 last 1563256877 Jul 15 23:07:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 23:07:36 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 15 23:09:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 15 23:09:11 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 15 23:09:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 23:09:17 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 15 23:12:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 23:12:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 23:13:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4318abeb-9c4e-e233-4085-6c5a2c444d71 (at 10.8.31.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25019c3800, cur 1563257604 expire 1563257454 last 1563257377 Jul 15 23:13:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 15 23:19:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 23:19:19 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 15 23:21:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 23:21:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 15 23:23:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 23:23:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 15 23:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 23:29:29 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 15 23:31:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 23:31:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 15 23:31:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 15 23:31:40 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 15 23:33:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 15 23:33:07 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 15 23:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 15 23:39:29 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 15 23:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 23:41:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 15 23:41:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 15 23:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 15 23:43:22 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 15 23:49:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 15 23:49:36 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 15 23:51:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 15 23:51:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 15 23:53:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 23:53:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 15 23:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 15 23:54:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 15 23:57:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 15 23:59:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 15 23:59:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 00:00:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 00:02:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:02:59 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 00:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 00:04:12 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 16 00:10:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 00:10:23 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 16 00:13:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:13:03 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 00:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 00:14:21 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 00:15:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 00:16:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 00:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 00:20:35 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 00:23:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:23:06 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 00:26:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 00:26:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 00:30:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 00:30:45 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 00:32:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 00:33:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:33:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 00:33:49 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 16 00:36:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 00:36:45 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 00:40:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 00:40:48 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 16 00:43:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:43:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 16 00:46:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 00:46:55 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 00:51:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 00:51:24 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 16 00:53:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 00:53:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 00:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 00:57:05 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 00:58:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:02:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 01:02:01 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 16 01:05:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 01:05:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:05:53 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 01:06:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 01:08:59 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 01:12:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 01:12:18 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 01:16:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 01:16:42 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 16 01:19:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 01:19:03 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 01:21:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:22:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 01:22:34 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 01:23:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:27:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:29:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 01:29:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 01:29:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 01:29:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 01:31:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:32:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 01:32:42 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 01:36:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:37:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:39:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 01:41:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 01:42:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 01:42:05 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 16 01:43:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 01:43:39 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 16 01:44:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:49:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 01:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 01:51:56 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 01:53:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 01:53:41 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 01:53:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 01:53:41 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 02:02:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 02:02:09 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 02:04:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 02:04:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 02:04:11 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 16 02:04:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 16 02:08:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:08:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 02:12:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 02:12:49 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 02:13:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:14:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 02:14:28 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 16 02:15:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 02:15:00 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 02:17:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:23:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 02:23:13 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 02:23:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:24:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 02:24:29 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 16 02:25:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 02:25:01 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 16 02:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 02:33:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 16 02:34:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 02:34:46 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 02:37:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 02:37:02 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 02:43:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 02:43:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 16 02:44:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 02:44:49 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 16 02:47:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 02:47:07 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 16 02:52:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:52:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 02:53:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 02:53:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 02:53:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 02:54:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 02:54:52 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 16 02:58:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 02:58:05 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 03:03:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 03:03:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 03:05:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 03:05:04 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 16 03:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 03:08:06 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 03:10:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:11:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 03:13:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 16 03:15:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 03:15:09 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 16 03:15:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:18:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 03:18:40 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 03:21:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:22:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:22:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 03:23:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 03:23:49 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 03:25:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 03:25:25 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 16 03:28:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 03:28:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 16 03:34:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 03:34:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 03:35:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 03:35:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 03:39:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 03:39:46 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 03:43:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:43:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 03:44:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 03:44:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 03:46:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 03:46:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 16 03:49:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 03:49:57 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 16 03:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 03:55:03 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 03:56:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 03:56:26 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 16 03:58:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 03:58:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 04:00:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 04:00:03 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 04:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 04:05:58 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 04:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 04:06:39 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 04:10:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 04:10:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 04:14:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 04:16:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 04:16:07 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 16 04:16:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 04:16:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 04:21:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 04:21:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 04:25:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 04:25:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 04:27:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 04:27:21 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 04:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 04:27:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 04:32:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 04:32:04 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 16 04:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 04:37:52 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 16 04:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 04:37:52 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 16 04:41:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 04:43:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 04:43:24 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 04:48:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 04:48:11 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 04:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 04:48:34 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 04:53:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 04:53:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 04:55:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 04:55:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 04:58:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 04:58:22 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 16 04:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 04:58:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 05:04:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 05:04:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 05:06:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 05:06:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 05:08:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 05:08:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 05:08:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 05:08:56 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 05:16:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 05:16:18 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 05:16:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 05:16:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 05:19:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 05:19:03 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 16 05:20:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 05:20:39 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 16 05:27:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 05:27:22 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 16 05:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 05:29:29 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 05:30:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 05:30:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 05:30:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 05:30:45 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 05:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d29abf800, cur 1563280627 expire 1563280477 last 1563280400 Jul 16 05:37:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 05:37:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 05:37:47 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 05:39:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 05:39:41 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 16 05:40:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 05:40:18 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 16 05:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 05:40:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 05:48:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 05:48:14 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 05:49:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 05:49:42 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 16 05:51:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 05:51:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 05:52:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 05:52:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 05:58:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 05:58:16 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 06:00:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 06:00:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 16 06:03:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 06:03:12 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 06:05:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 06:05:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 06:08:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 06:08:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 16 06:10:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 06:10:18 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 06:13:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 06:13:20 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 06:18:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 06:18:19 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 06:18:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 06:18:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 06:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 06:20:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 16 06:23:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 06:23:35 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 06:28:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 06:28:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 16 06:30:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 06:30:40 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 16 06:34:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 06:34:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 06:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 06:38:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 06:38:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 06:38:43 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 06:40:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 06:40:45 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 06:44:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 06:44:39 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 06:48:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 06:48:47 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 16 06:50:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 06:50:47 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 16 06:50:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 06:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 06:54:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 06:58:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 06:58:59 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 07:00:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 07:00:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 07:00:53 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 16 07:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 07:04:48 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 07:10:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 07:10:37 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 16 07:10:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 07:10:57 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 16 07:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 07:14:58 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 07:21:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 07:21:06 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 07:21:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 07:21:06 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 07:25:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 07:25:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 07:26:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 07:26:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 07:31:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 07:31:18 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 07:31:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 07:31:18 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 16 07:35:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 07:35:35 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 07:35:46 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563287739/real 1563287739] req@ffff8f0fa1b04b00 x1636734259257280/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563287746 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 16 07:35:46 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 16 07:35:54 fir-md1-s1 kernel: Lustre: 23617:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c03ef0600 x1637074370050064/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1563287759 ref 2 fl Interpret:/0/0 rc 0/0 Jul 16 07:36:00 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563287753/real 1563287753] req@ffff8f0fa1b04b00 x1636734259257280/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563287760 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 07:36:00 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 16 07:36:21 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563287774/real 1563287774] req@ffff8f0fa1b04b00 x1636734259257280/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563287781 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 07:36:21 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 16 07:36:56 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563287809/real 1563287809] req@ffff8f0fa1b04b00 x1636734259257280/t0(0) o106->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563287816 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 07:36:56 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 16 07:37:11 fir-md1-s1 kernel: Lustre: 20724:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22ed7acb00 x1638743228030656/t413353082599(0) o36->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:16/0 lens 488/3152 e 1 to 0 dl 1563287836 ref 2 fl Interpret:/0/0 rc 0/0 Jul 16 07:38:06 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563287879/real 1563287879] req@ffff8f22a876f200 x1636734259459536/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563287886 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 07:38:06 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Jul 16 07:38:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e2395479-2e77-72eb-0463-0a8132abf7a2 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2adae4e800, cur 1563287894 expire 1563287744 last 1563287667 Jul 16 07:38:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2937be20-d10e-e8be-021e-f15970c1f1ca (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25019c8800, cur 1563287906 expire 1563287756 last 1563287679 Jul 16 07:38:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e2395479-2e77-72eb-0463-0a8132abf7a2 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef8d69000, cur 1563287917 expire 1563287767 last 1563287690 Jul 16 07:38:37 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (68:110s); client may timeout. req@ffff8f0c03ef0600 x1637074370050064/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:29/0 lens 480/536 e 1 to 0 dl 1563287807 ref 1 fl Complete:/0/0 rc 301/301 Jul 16 07:41:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 47165b52-9a87-7d0f-a599-0176fbcd4b72 (at 10.8.9.8@o2ib6) in 187 seconds. I think it's dead, and I am evicting it. exp ffff8f450920b000, cur 1563288082 expire 1563287932 last 1563287895 Jul 16 07:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 07:41:27 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 16 07:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a199ce2c-443f-920d-f89a-ada6201ffa38 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ad660000, cur 1563288121 expire 1563287971 last 1563287894 Jul 16 07:42:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 07:42:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 16 07:46:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 07:46:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 07:49:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 07:51:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 07:51:36 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 16 07:52:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 07:52:37 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 16 07:56:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 07:56:59 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 08:00:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:02:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 08:02:01 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 08:02:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 08:02:46 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 08:07:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 08:07:06 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 16 08:11:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:12:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 08:12:23 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 08:12:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 08:12:49 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 08:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 08:18:16 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 08:22:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 08:22:31 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 16 08:23:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 08:23:28 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 08:24:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:26:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 169021a4-a808-827d-1880-f3d0a2ab5ac3 (at 10.9.103.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4519647400, cur 1563290814 expire 1563290664 last 1563290587 Jul 16 08:26:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 16 08:26:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 08:29:29 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 08:32:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 08:32:36 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 16 08:33:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 08:33:37 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 16 08:35:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f61b1c5d-c18b-55d2-6bab-10f13c5e21f5 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3793e26000, cur 1563291300 expire 1563291150 last 1563291073 Jul 16 08:35:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 08:40:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 08:40:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 08:42:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 08:42:52 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 08:44:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 08:44:36 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 08:48:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:51:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 08:51:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 08:52:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef5552800, cur 1563292336 expire 1563292186 last 1563292109 Jul 16 08:52:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 08:52:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 08:52:54 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 08:54:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 08:56:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19469e3400, cur 1563292575 expire 1563292425 last 1563292348 Jul 16 08:57:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 08:57:53 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 09:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:01:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 09:01:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 09:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 09:03:43 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 16 09:04:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:09:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 09:09:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 09:10:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:12:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:13:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 09:13:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 16 09:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 09:13:58 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 09:15:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:19:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 09:19:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 09:23:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 09:23:08 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 09:23:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1458fa5400, cur 1563294230 expire 1563294080 last 1563294003 Jul 16 09:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 09:24:14 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 16 09:24:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:24:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 09:29:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:29:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 09:29:19 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 16 09:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 09:33:19 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 16 09:34:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 09:34:33 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 16 09:39:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 09:39:21 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 09:43:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 09:43:49 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 09:44:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 09:44:41 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 09:45:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:45:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 09:48:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:49:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 09:49:35 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 09:50:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 09:54:06 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 09:54:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 09:54:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 09:54:59 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 16 10:00:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 10:00:43 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 10:01:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 10:01:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 10:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 10:04:32 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 16 10:05:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 10:05:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 16 10:10:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 10:10:57 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 10:13:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 10:13:59 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 16 10:15:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 10:15:12 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 10:15:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 10:15:12 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 16 10:22:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 10:22:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 10:25:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 10:25:14 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 16 10:25:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 10:25:14 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 16 10:32:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 10:32:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 10:32:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 10:32:32 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 16 10:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 10:35:23 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 16 10:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 10:35:23 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 16 10:43:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 10:43:24 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 10:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 10:45:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 10:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 10:45:43 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 16 10:53:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 10:53:44 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 10:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 10:56:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 10:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 10:56:06 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 16 10:56:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64680640-7b2b-7c79-c49e-13dc0069ad13 (at 10.9.107.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14f2e7f000, cur 1563299790 expire 1563299640 last 1563299563 Jul 16 10:56:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64680640-7b2b-7c79-c49e-13dc0069ad13 (at 10.9.107.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45307df400, cur 1563299791 expire 1563299641 last 1563299564 Jul 16 11:01:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:01:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 11:03:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:04:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 11:04:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 11:06:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 11:06:07 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 11:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 11:06:42 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 16 11:14:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 11:14:14 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 16 11:15:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:16:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 11:16:16 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 16 11:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 11:16:55 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 11:22:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:22:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 11:24:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 11:24:19 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 16 11:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 11:27:10 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 11:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 11:27:10 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 16 11:34:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 11:34:20 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 11:37:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:37:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 16 11:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 11:37:30 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 11:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 11:37:30 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 16 11:44:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 11:44:43 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 11:47:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 11:47:51 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 11:48:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ada4b8c4-a096-106c-aa30-49acf3f75a9b (at 10.9.115.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0f3c45f800, cur 1563302889 expire 1563302739 last 1563302662 Jul 16 11:48:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 16 11:48:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 11:48:11 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 16 11:55:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 11:55:47 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 16 11:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 11:58:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 11:58:04 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 11:58:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 11:58:33 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 12:05:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 12:05:53 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 16 12:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 12:08:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 12:08:44 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 16 12:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 12:08:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 16 12:16:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 16 12:16:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 16 12:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 12:19:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 12:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 12:19:07 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 12:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cca39c27-f446-a2c0-652d-39068d0785b9 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3676a79c00, cur 1563304935 expire 1563304785 last 1563304708 Jul 16 12:22:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 12:24:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 12:24:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 12:26:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 12:26:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 12:29:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 12:29:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 12:29:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 12:29:19 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 16 12:36:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 12:36:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 12:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 12:39:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 12:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 12:39:42 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 12:43:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 12:43:03 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 16 12:48:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 12:49:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 12:49:44 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 16 12:50:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 12:50:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 12:53:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 12:53:42 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 12:58:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1eb949cc00, cur 1563307087 expire 1563306937 last 1563306860 Jul 16 12:58:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 12:59:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 12:59:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 13:00:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 13:00:07 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 16 13:04:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 13:04:22 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 13:09:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 13:09:54 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 13:10:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 13:10:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 13:15:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 13:15:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 13:18:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:18:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 13:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 13:20:22 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 16 13:20:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:20:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 13:20:58 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 13:25:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 13:25:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 13:30:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 13:30:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 13:31:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 13:31:07 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 13:32:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:32:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 13:36:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 13:36:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 13:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 13:40:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 13:41:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 13:41:52 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 13:46:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 13:46:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 13:49:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:51:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 13:51:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 13:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 13:51:56 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 16 13:52:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:54:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 13:56:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 13:56:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 14:01:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:02:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 14:02:30 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 16 14:02:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 14:02:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 14:06:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 14:06:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 16 14:12:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 14:12:31 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 16 14:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 14:13:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 14:19:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 14:19:07 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 16 14:19:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:22:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 14:22:52 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 16 14:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 14:23:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 14:25:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:29:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 14:29:08 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 16 14:29:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:32:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 14:32:54 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 16 14:34:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 14:34:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 14:39:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 14:39:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 16 14:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 14:43:38 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 16 14:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 14:44:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 14:46:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:46:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 14:50:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 14:50:19 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 14:51:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 14:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 14:53:46 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 16 14:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 14:54:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 15:02:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:02:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:02:27 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 15:03:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 15:03:50 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 15:05:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 15:05:04 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 15:05:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:08:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:12:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:12:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:12:42 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 16 15:13:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:13:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 15:13:51 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 16 15:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 15:15:17 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 16 15:17:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:22:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:22:47 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 16 15:23:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:24:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 15:24:22 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 16 15:25:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 15:25:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 15:29:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:29:41 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 15:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:34:03 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 15:34:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 15:34:23 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 16 15:35:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 15:35:48 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 15:40:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:40:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 15:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 15:44:25 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 16 15:44:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:44:41 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 15:45:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 15:45:57 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 15:54:18 fir-md1-s1 kernel: Lustre: Modifying parameter fir-*.mdt.fir-*.enable_remote_rename in log params Jul 16 15:54:25 fir-md1-s1 kernel: Lustre: 126006:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563317658/real 1563317658] req@ffff8f1897ca3f00 x1636734640995024/t0(0) o104->MGS@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563317665 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 16 15:54:25 fir-md1-s1 kernel: Lustre: 126006:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 16 15:54:39 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Tue Jul 16 15:54:39 2019 Jul 16 15:54:43 fir-md1-s1 kernel: Lustre: 126006:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563317676/real 1563317676] req@ffff8f1785701e00 x1636734641234832/t0(0) o105->MGS@10.8.11.6@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1563317683 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 15:54:43 fir-md1-s1 kernel: Lustre: 126006:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 16 15:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 15:54:47 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 16 15:55:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 15:55:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 15:55:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 15:55:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 15:56:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 15:56:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 16 16:04:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 16:04:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 16 16:06:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 16:06:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 16:07:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:07:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 16:08:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 16:08:17 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 16:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 16:15:11 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 16:18:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:18:43 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 16:19:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 16:19:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 16 16:19:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 16:19:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 16:25:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 16:25:12 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 16 16:28:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:28:51 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 16 16:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 16:30:20 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 16:30:28 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563319821/real 1563319821] req@ffff8f2607264500 x1636734661665808/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563319828 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 16 16:30:28 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 16 16:30:35 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563319828/real 1563319828] req@ffff8f2607264500 x1636734661665808/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563319835 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 16:30:36 fir-md1-s1 kernel: Lustre: 23629:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f142843a400 x1638791526821904/t413391405187(0) o36->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:11/0 lens 488/3152 e 1 to 0 dl 1563319841 ref 2 fl Interpret:/0/0 rc 0/0 Jul 16 16:30:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 16:30:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 16:30:50 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563319842/real 1563319842] req@ffff8f2607264500 x1636734661665808/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563319849 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 16:30:50 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 16 16:31:11 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563319864/real 1563319864] req@ffff8f2607264500 x1636734661665808/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563319871 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 16:31:11 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 16 16:31:46 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563319899/real 1563319899] req@ffff8f2607264500 x1636734661665808/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563319906 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 16:31:46 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 16 16:32:02 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22747a8c00 x1633748027840640/t413391505134(0) o36->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:6/0 lens 488/3152 e 1 to 0 dl 1563319926 ref 2 fl Interpret:/0/0 rc 0/0 Jul 16 16:32:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 78b921fb-e823-1096-e04b-69af8497986a (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f6f00d800, cur 1563319974 expire 1563319824 last 1563319747 Jul 16 16:35:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 16:35:43 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 16 16:38:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:38:54 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 16:40:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 16:40:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 16:45:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 16:45:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 16:45:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 16:45:54 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 16 16:49:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:49:43 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 16:52:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 16:52:19 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 16:55:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 16:55:55 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 16:57:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 16:57:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 16:59:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 16:59:52 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 16 17:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 17:02:57 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 17:06:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 17:06:00 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 17:11:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 16 17:11:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 16 17:13:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 17:13:17 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 17:16:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 17:16:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 17:22:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 17:22:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 16 17:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 17:23:19 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 16 17:26:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 17:26:07 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 16 17:27:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 17:32:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 17:33:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 17:33:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 16 17:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 17:33:57 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 17:36:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 17:36:09 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 16 17:40:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 17:40:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 17:43:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 17:43:54 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 16 17:44:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 17:44:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 17:46:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 17:46:19 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 16 17:47:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:48:07 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:49:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 17:49:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 17:49:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9135e57a-f4e0-df9a-82b0-c2a48f21a734 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec4d06c00, cur 1563324572 expire 1563324422 last 1563324345 Jul 16 17:49:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 17:51:16 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:51:52 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:52:24 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:54:11 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:54:41 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:54:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 16 17:54:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 16 17:55:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 17:55:00 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 17:56:16 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:56:16 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 16 17:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 16 17:56:23 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 16 17:57:46 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 17:57:46 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 16 18:00:56 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563325249/real 1563325249] req@ffff8f26468fef00 x1636734711042144/t0(0) o104->fir-MDT0000@10.8.27.16@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563325256 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 16 18:00:56 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Jul 16 18:01:04 fir-md1-s1 kernel: Lustre: 20460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2d94b8d400 x1634620981668304/t0(0) o101->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:9/0 lens 1808/3288 e 1 to 0 dl 1563325269 ref 2 fl Interpret:/0/0 rc 0/0 Jul 16 18:01:10 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563325263/real 1563325263] req@ffff8f26468fef00 x1636734711042144/t0(0) o104->fir-MDT0000@10.8.27.16@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563325270 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 16 18:01:10 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 16 18:01:24 fir-md1-s1 kernel: LustreError: 97639:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.16@o2ib6) failed to reply to blocking AST (req@ffff8f26468fef00 x1636734711042144 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2513520480/0x5d9ee6433deac800 lrc: 4/0,0 mode: PR/PR res: [0x2000298c3:0x176:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.27.16@o2ib6 remote: 0xc5b8e2359c85e32d expref: 20 pid: 97665 timeout: 2440366 lvb_type: 0 Jul 16 18:01:24 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.16@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 16 18:01:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.27.16@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2513520480/0x5d9ee6433deac800 lrc: 3/0,0 mode: PR/PR res: [0x2000298c3:0x176:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.27.16@o2ib6 remote: 0xc5b8e2359c85e32d expref: 21 pid: 97665 timeout: 0 lvb_type: 0 Jul 16 18:01:47 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:02:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 18:02:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 16 18:03:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 393b5fde-e98f-60d4-0397-472006e679db (at 10.8.27.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2504d83400, cur 1563325383 expire 1563325233 last 1563325156 Jul 16 18:03:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 18:05:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 18:05:04 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 16 18:05:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 18:05:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 16 18:06:21 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:06:21 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 16 18:06:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 18:06:23 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 16 18:13:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 18:13:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 18:15:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 18:15:07 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 16 18:15:07 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:15:07 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Jul 16 18:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 16 18:15:14 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 18:16:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 18:16:23 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 16 18:25:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 18:25:20 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 16 18:25:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 18:25:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 18:26:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 18:26:26 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 16 18:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 18:27:08 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 18:29:16 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:29:16 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jul 16 18:36:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 18:36:00 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 18:36:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 18:36:30 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 16 18:37:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 18:37:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 18:37:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 18:37:39 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 18:41:15 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:41:15 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 16 18:46:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 18:46:18 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 18:46:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 18:46:31 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 16 18:47:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 18:47:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 16 18:48:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 18:48:44 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 18:52:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 18:52:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Jul 16 18:56:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 18:56:47 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 16 18:57:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 18:57:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 18:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 18:59:20 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 16 18:59:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6cc40017-bd71-872e-b5f4-1fcc1b80ec51 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f238f8e0000, cur 1563328790 expire 1563328640 last 1563328563 Jul 16 18:59:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 16 19:00:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 19:00:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 19:03:30 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 19:03:30 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jul 16 19:07:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 19:07:23 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 19:07:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 19:07:23 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 16 19:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 16 19:09:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 16 19:10:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 19:10:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 19:13:46 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 19:13:46 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 16 19:18:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 19:18:13 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 19:18:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 19:18:13 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 16 19:20:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 19:20:11 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 19:21:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 19:21:29 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 16 19:23:50 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 19:23:50 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Jul 16 19:28:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e1008ff1-7911-4d3d-cd72-11efd094b730 (at 10.8.8.30@o2ib6) Jul 16 19:28:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 16 19:28:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 19:28:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 19:29:48 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e0007c12-8677-1295-c9e3-dc73b6a618ac (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f43a80f6800, cur 1563330588 expire 1563330438 last 1563330361 Jul 16 19:29:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 19:30:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 19:30:52 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 16 19:36:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 19:36:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 19:39:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 19:39:06 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 16 19:39:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 19:39:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 16 19:39:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 19:39:42 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 19:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 19:41:12 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 19:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 27c48995-3d14-3108-d0d3-4e8aa244510b (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30c7dc6c00, cur 1563331597 expire 1563331447 last 1563331370 Jul 16 19:46:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 19:49:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Jul 16 19:49:19 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 16 19:50:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 19:50:24 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 16 19:53:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 19:53:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 16 19:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4ed462a8-ed6a-0891-ced6-ebadfda1f88d (at 10.8.8.30@o2ib6) reconnecting Jul 16 19:53:59 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 16 19:55:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 19:55:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 19:59:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 19:59:22 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 20:01:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 20:01:53 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 16 20:05:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 20:05:34 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 16 20:07:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 16 20:07:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 10 previous similar messages Jul 16 20:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 20:09:30 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 16 20:10:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 20:10:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 20:11:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 20:11:55 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 16 20:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 20:15:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 16 20:19:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 20:19:52 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 16 20:22:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 16 20:22:46 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 16 20:25:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 20:25:55 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 16 20:29:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 20:29:51 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 16 20:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 20:30:20 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 16 20:33:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 16 20:33:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 20:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c0f757c7-52f2-44d4-89ab-447f2b8b1bf9 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22537adc00, cur 1563334520 expire 1563334370 last 1563334293 Jul 16 20:35:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 20:36:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 20:36:23 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 20:40:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 20:40:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 16 20:43:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 20:43:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 16 20:44:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 20:44:08 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 20:46:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 16 20:46:32 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 20:47:36 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e6e118000, cur 1563335256 expire 1563335106 last 1563335029 Jul 16 20:47:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 20:50:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 16 20:50:58 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 16 20:54:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 20:54:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 20:56:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 20:56:40 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 20:58:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 20:58:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 21:00:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 21:00:59 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 21:05:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 21:05:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 21:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 21:06:58 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 21:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a3ba9cea-729e-f1f0-0275-7cb26326b4e6 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e7fa05c00, cur 1563336448 expire 1563336298 last 1563336221 Jul 16 21:11:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 21:11:01 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 21:11:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 21:11:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 21:15:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 21:15:41 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 16 21:18:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 21:18:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 16 21:21:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 21:21:04 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 16 21:21:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 21:21:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 21:27:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 21:27:59 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 16 21:28:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 21:28:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 21:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 21:31:29 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 16 21:33:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 21:33:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 21:37:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 16 21:37:59 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 16 21:38:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 21:38:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 16 21:41:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 21:41:34 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 16 21:45:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 21:45:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 21:48:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 21:48:33 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 16 21:49:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 16 21:49:21 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 21:51:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 21:51:38 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 16 21:58:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 21:58:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 16 21:59:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 21:59:25 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 22:02:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 16 22:02:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 22:10:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 22:10:02 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 22:11:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 22:11:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 16 22:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 22:13:05 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 16 22:20:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 22:20:44 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 16 22:22:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 22:22:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 16 22:22:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 22:22:45 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 16 22:23:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 22:23:16 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 16 22:28:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 22:31:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 22:31:15 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 16 22:31:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 22:32:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 22:32:10 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 16 22:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 16 22:34:00 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 16 22:37:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 22:37:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 22:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 22:41:25 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 22:42:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 22:42:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 16 22:42:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eae53b8e-fc5a-00ac-6926-f45250ad9270 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f165e551400, cur 1563342162 expire 1563342012 last 1563341935 Jul 16 22:42:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 22:44:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 22:44:05 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 16 22:51:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 22:51:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 22:54:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 16 22:54:14 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 16 22:54:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 16 22:54:30 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 16 22:57:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 22:57:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 16 22:57:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5aa128b5-4096-9c78-a700-9a7d01aae891 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f05333800, cur 1563343078 expire 1563342928 last 1563342851 Jul 16 22:57:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 16 22:58:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4356abbb-5672-a468-8014-48f6625b62fb (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26341e3000, cur 1563343095 expire 1563342945 last 1563342868 Jul 16 23:02:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 23:02:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 16 23:04:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to acd26ab4-a020-fbc0-1a40-f0e7d759131f (at 10.8.23.14@o2ib6) Jul 16 23:04:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 16 23:05:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:05:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 16 23:13:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 23:13:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 16 23:14:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 23:14:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 16 23:15:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:15:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 16 23:15:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 23:15:03 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 16 23:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 16 23:23:19 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 16 23:25:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:25:09 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 16 23:25:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 23:25:09 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 16 23:25:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 23:25:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 16 23:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 23:33:36 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 16 23:35:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 23:35:24 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 16 23:35:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:35:25 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 16 23:42:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 16 23:42:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 16 23:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 16 23:43:38 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 16 23:45:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:45:26 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 16 23:45:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 23:45:26 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 16 23:47:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4ed6882d-a199-f657-1462-42177ac2f2c7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2157382c00, cur 1563346070 expire 1563345920 last 1563345843 Jul 16 23:47:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 16 23:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 16 23:54:02 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 16 23:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 16 23:55:27 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 16 23:56:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 16 23:56:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 00:02:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 00:02:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 00:04:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 00:04:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 00:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 00:05:52 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 17 00:09:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 00:09:57 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 00:12:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 00:12:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 00:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 00:14:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 00:15:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 00:15:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 00:20:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 00:20:01 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 00:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76984a38-5725-a313-b2a2-fe69e110dea9 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2df1f2f400, cur 1563348024 expire 1563347874 last 1563347797 Jul 17 00:20:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 00:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 00:24:37 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 00:25:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 00:25:48 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 00:25:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 00:25:58 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 17 00:30:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 00:30:04 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 00:33:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e8486f000, cur 1563348832 expire 1563348682 last 1563348605 Jul 17 00:33:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 00:35:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 00:35:17 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 00:36:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 00:36:17 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 00:37:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 00:37:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 00:41:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 00:41:07 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 00:46:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 00:46:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 00:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 00:47:19 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 00:51:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 00:51:41 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 17 00:52:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 86e03a66-c2f8-124c-0407-96779560ee5a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3470e94400, cur 1563349976 expire 1563349826 last 1563349749 Jul 17 00:56:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 00:56:41 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 00:58:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 00:58:07 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 01:01:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 01:01:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 01:02:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:02:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 01:06:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 01:06:48 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 17 01:07:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:08:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 67b1dfdf-2118-a1bb-f91e-bab57f5e8cfd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fae40e400, cur 1563350926 expire 1563350776 last 1563350699 Jul 17 01:08:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 01:10:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 01:10:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 17 01:11:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 01:11:48 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 17 01:12:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:16:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 01:16:55 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 17 01:18:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:18:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 01:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 01:21:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 01:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f1492286-4d18-8636-1e90-f4abec605bb6 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f58d21800, cur 1563351780 expire 1563351630 last 1563351553 Jul 17 01:23:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 01:23:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 01:23:00 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 17 01:27:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 01:27:48 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 17 01:28:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:28:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 01:32:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 01:32:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 01:33:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 01:33:03 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 17 01:38:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 01:38:21 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 17 01:40:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:40:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 01:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8add610-9029-36ed-1131-bd3f36c7746e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f168fbc9800, cur 1563352870 expire 1563352720 last 1563352643 Jul 17 01:41:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 01:43:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 01:43:39 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 01:44:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 01:44:22 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 01:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 01:49:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 01:51:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 01:51:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 01:53:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 01:53:57 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 01:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 01:54:23 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 17 01:57:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eec8ab8b-cf20-989c-f1d6-4ef269ba9173 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33727aa000, cur 1563353841 expire 1563353691 last 1563353614 Jul 17 01:57:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 01:59:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 01:59:08 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 17 02:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 02:05:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 17 02:06:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 02:06:52 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 17 02:09:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 02:09:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 02:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 02:11:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 17 02:11:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2a2fada3-b59e-d248-8134-52cef0856884 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dd06dc400, cur 1563354710 expire 1563354560 last 1563354483 Jul 17 02:11:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 02:15:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 02:15:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 02:17:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 02:17:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 17 02:21:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 02:21:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 02:21:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 02:21:12 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 02:26:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 02:26:18 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 17 02:27:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 17 02:27:25 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 17 02:31:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 02:31:17 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 17 02:34:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 02:34:35 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 02:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 02:36:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 02:37:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 02:37:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 02:41:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 02:41:18 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 02:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ded40156-db07-2426-21a9-4ff4c918dfb8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee72dc800, cur 1563356716 expire 1563356566 last 1563356489 Jul 17 02:45:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 02:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 02:46:26 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 17 02:46:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 02:46:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 02:47:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 02:47:34 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 02:51:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 02:51:20 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 02:56:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 02:56:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 02:57:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 02:57:44 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 17 03:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 03:01:33 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 03:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 66c35cdc-b523-7b01-6539-1d69a77b5fba (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e420bc400, cur 1563357709 expire 1563357559 last 1563357482 Jul 17 03:01:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 03:02:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 03:02:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 03:07:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 03:07:54 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 03:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 03:08:52 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 03:11:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 03:11:49 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 17 03:13:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 03:18:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 03:18:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 03:21:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 03:21:45 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 17 03:21:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 03:21:54 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 17 03:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 03:28:51 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 03:29:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 03:29:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 03:32:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 03:32:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 03:32:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 03:32:17 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 17 03:33:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a3a5c5e3-99db-9eb1-90b6-29400007c982 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25203fac00, cur 1563359628 expire 1563359478 last 1563359401 Jul 17 03:33:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 03:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 03:39:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 17 03:42:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 03:42:42 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 17 03:42:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 03:42:42 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 17 03:46:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 03:46:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 03:49:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 03:49:16 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 17 03:53:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 03:53:34 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 17 03:54:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 03:54:31 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 17 03:58:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 03:58:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 03:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 03:59:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 17 04:03:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 04:03:35 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 04:05:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 32733661-c443-9c78-2577-6da588bf656d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f214582b400, cur 1563361546 expire 1563361396 last 1563361319 Jul 17 04:05:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 04:06:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 04:06:43 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 17 04:08:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 04:08:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 04:11:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 04:11:33 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 04:13:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 04:13:35 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 04:16:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 04:16:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 04:21:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 04:21:47 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 04:21:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 40dfc807-3d95-1fa9-56c6-bceaed32d63e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34a0fda400, cur 1563362519 expire 1563362369 last 1563362292 Jul 17 04:21:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 04:23:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 04:23:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 17 04:27:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 04:27:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 04:32:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 04:32:02 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 17 04:33:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 04:33:42 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 17 04:34:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8162bbbc-8202-cbd7-3be7-53ea19cb0df9 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2345bf2000, cur 1563363288 expire 1563363138 last 1563363061 Jul 17 04:34:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 04:34:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 04:34:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 04:38:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 04:38:59 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 04:39:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cdd136a8-f93c-96af-0449-16dde3f68eec (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0ab998fc00, cur 1563363573 expire 1563363423 last 1563363346 Jul 17 04:39:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 04:40:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 04:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 04:42:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 17 04:44:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 04:44:21 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 17 04:46:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 04:50:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 04:50:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 04:52:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 04:52:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 04:52:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 04:52:27 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 17 04:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 04:54:31 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 05:00:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:00:44 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 05:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 05:02:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 05:04:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 05:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9d2c8374-3c47-8752-b72b-18e788934a51 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3177a4fc00, cur 1563365071 expire 1563364921 last 1563364844 Jul 17 05:04:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 05:04:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 05:04:35 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 17 05:11:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:11:47 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 17 05:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 05:12:59 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 05:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 05:15:01 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 17 05:21:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:21:48 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 05:22:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 05:25:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 05:25:35 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 17 05:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 05:25:57 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 05:36:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:36:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 05:36:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 05:36:00 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 05:37:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 05:37:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 17 05:40:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 05:40:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 05:46:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 05:46:06 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 17 05:46:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:46:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 05:47:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 05:47:36 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 17 05:50:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 05:50:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 05:51:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e22b8874-23c2-f6ed-cfaf-2accf516c2c6 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f29fba70000, cur 1563367885 expire 1563367735 last 1563367658 Jul 17 05:51:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 05:56:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 05:56:15 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 05:56:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 05:56:27 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 05:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 05:57:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 06:01:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 06:01:03 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 06:06:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 06:06:16 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 17 06:06:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 06:06:28 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 17 06:08:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 06:08:01 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 06:16:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 06:16:31 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 17 06:17:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 06:17:58 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 17 06:18:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 06:18:33 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 17 06:19:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 06:19:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 06:26:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 06:26:38 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 06:28:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 06:28:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 06:29:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 06:29:13 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 06:29:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 06:29:47 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 17 06:30:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client adc31eca-2c74-c5dc-264c-b72770599b68 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f241506a800, cur 1563370225 expire 1563370075 last 1563369998 Jul 17 06:30:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 06:31:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 222 seconds. I think it's dead, and I am evicting it. exp ffff8f1c9f79c800, cur 1563370301 expire 1563370151 last 1563370079 Jul 17 06:31:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 06:34:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f449cd4c800, cur 1563370485 expire 1563370335 last 1563370258 Jul 17 06:37:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 06:37:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 06:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 06:38:52 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 06:41:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 06:41:18 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 17 06:42:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 06:42:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 06:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 06:47:15 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 17 06:49:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 06:49:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 17 06:53:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 06:53:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 06:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 06:57:18 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 06:59:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 06:59:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 07:00:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 17 07:00:55 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 17 07:03:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 07:03:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 17 07:07:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 07:07:37 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 17 07:09:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 45c8834a-e627-74ab-6b42-3bf831e5d4f1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e16d66400, cur 1563372567 expire 1563372417 last 1563372340 Jul 17 07:11:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 07:11:22 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 07:13:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 07:13:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 07:14:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 07:14:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 07:17:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 07:17:41 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 07:23:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 07:23:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 07:25:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 07:25:01 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 07:26:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 07:26:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 07:27:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 07:27:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 17 07:33:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 07:33:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 07:35:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 07:35:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 07:38:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 07:38:01 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 17 07:40:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 07:40:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 07:43:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 07:43:52 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 07:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 07:45:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 07:48:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 07:48:10 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 17 07:49:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e796e449-ed61-dd3d-9756-9075e3b64d6d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2152abb000, cur 1563374972 expire 1563374822 last 1563374745 Jul 17 07:49:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 07:55:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 07:55:05 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 17 07:55:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 07:55:29 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 07:56:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 07:56:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 07:58:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 07:58:15 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 17 08:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1b90433c-235e-7531-cfe6-8ebc9f785a9b (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3213b48c00, cur 1563375802 expire 1563375652 last 1563375575 Jul 17 08:03:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 08:05:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 08:05:47 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 17 08:06:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 08:06:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 08:06:52 fir-md1-s1 kernel: Lustre: 21369:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563376005/real 1563376005] req@ffff8f124d64ce00 x1636735226703088/t0(0) o104->fir-MDT0000@10.8.27.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563376012 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 17 08:06:52 fir-md1-s1 kernel: Lustre: 21369:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 17 08:06:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client affea8a0-cf20-4308-3250-8962e5fd8e0c (at 10.8.27.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efce7400, cur 1563376019 expire 1563375869 last 1563375792 Jul 17 08:06:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 08:08:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 08:08:28 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 08:09:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 08:09:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 08:16:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 08:16:51 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 17 08:19:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3457b4ac00, cur 1563376755 expire 1563376605 last 1563376528 Jul 17 08:19:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 08:20:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 08:20:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 17 08:22:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 08:22:44 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 08:23:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f165aa1bc00, cur 1563377007 expire 1563376857 last 1563376780 Jul 17 08:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 622ac670-ed55-e5f8-8ee1-c83e956c11c4 (at 10.9.101.39@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2537d5b400, cur 1563377046 expire 1563376896 last 1563376819 Jul 17 08:24:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 08:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2cc0bc1b-7a1f-9dab-b36c-c6206a02385d (at 10.8.20.20@o2ib6) in 169 seconds. I think it's dead, and I am evicting it. exp ffff8f34f0b22c00, cur 1563377122 expire 1563376972 last 1563376953 Jul 17 08:25:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 08:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2cc0bc1b-7a1f-9dab-b36c-c6206a02385d (at 10.8.20.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3509f52c00, cur 1563377180 expire 1563377030 last 1563376953 Jul 17 08:27:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 08:27:09 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 17 08:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4f01ae82-88a9-e69d-8062-163ff44e7451 (at 10.8.23.14@o2ib6) in 162 seconds. I think it's dead, and I am evicting it. exp ffff8f20386d5400, cur 1563377256 expire 1563377106 last 1563377094 Jul 17 08:27:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 08:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4f01ae82-88a9-e69d-8062-163ff44e7451 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18bb692400, cur 1563377321 expire 1563377171 last 1563377094 Jul 17 08:28:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 08:30:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 08:30:38 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 17 08:32:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 08:32:55 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 17 08:37:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 08:37:17 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 17 08:40:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 08:40:57 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 17 08:42:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 08:42:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 08:42:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 08:42:58 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 17 08:46:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 08:49:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 08:49:19 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 17 08:50:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 08:50:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 08:51:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 08:51:02 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 08:51:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96ab2bce-0537-192b-6ab0-34fd5870d665 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c4f38f000, cur 1563378698 expire 1563378548 last 1563378471 Jul 17 08:53:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 08:53:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 08:56:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 590b5514-2fde-e913-bede-dd5040a656f6 (at 10.9.103.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252bab9000, cur 1563379019 expire 1563378869 last 1563378792 Jul 17 08:56:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 08:57:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 08:57:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 08:57:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e1549bb2-5a11-0470-1e11-97fceeb90247 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3435bba800, cur 1563379031 expire 1563378881 last 1563378804 Jul 17 08:57:11 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 17 08:59:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 08:59:59 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 17 09:01:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 09:01:38 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 09:02:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client edb2d126-8716-249b-1676-463cf5b8fed2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2639c5f800, cur 1563379364 expire 1563379214 last 1563379137 Jul 17 09:02:44 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 17 09:05:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 09:05:27 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 09:10:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5b6d5b7b-9195-c890-0245-1cc5d002a943 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e5f032400, cur 1563379825 expire 1563379675 last 1563379598 Jul 17 09:10:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 09:10:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 09:10:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 17 09:13:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 09:13:02 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 17 09:15:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 09:15:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 09:15:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 09:15:56 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 09:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 09:20:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 09:23:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c607975-0d2c-59f3-e3ac-3ea5d87af78a (at 10.9.107.37@o2ib4) Jul 17 09:23:29 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 09:27:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 09:27:08 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 09:28:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 09:28:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 09:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 09:31:04 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 17 09:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 09:33:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 09:37:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 09:37:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 09:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a833c0a7-f9c4-8d43-6756-8f4d74f3c339 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2067234400, cur 1563381594 expire 1563381444 last 1563381367 Jul 17 09:39:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 09:40:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 09:40:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 09:41:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 09:41:13 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 17 09:43:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 09:43:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 17 09:43:51 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3177b40c00, cur 1563381831 expire 1563381681 last 1563381604 Jul 17 09:43:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 17 09:43:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 09:43:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 09:47:18 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 09:47:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 09:47:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 17 09:51:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 09:51:19 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 09:52:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 09:52:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 09:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 09:53:59 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 17 09:55:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a55d5759-498f-2fc9-3549-79bedf3510ab (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36cd25c800, cur 1563382538 expire 1563382388 last 1563382311 Jul 17 09:56:46 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 09:58:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 09:58:32 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 17 10:02:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 10:02:01 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 10:02:03 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 10:04:08 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 17 10:05:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 10:05:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 17 10:08:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:08:49 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 10:12:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:12:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 17 10:13:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 10:13:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 10:14:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 10:14:23 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 17 10:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client be490749-aadb-9c98-b99e-b07e8d1a8aea (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19e56a7c00, cur 1563383716 expire 1563383566 last 1563383489 Jul 17 10:15:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 10:16:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 10:16:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 10:16:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26199476-1172-6d72-2d89-097a06f33aa8 (at 10.8.23.14@o2ib6) in 216 seconds. I think it's dead, and I am evicting it. exp ffff8f1618908800, cur 1563383792 expire 1563383642 last 1563383576 Jul 17 10:16:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 10:18:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:18:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 10:23:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 10:23:33 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 17 10:25:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 10:25:00 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 17 10:28:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 10:28:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 10:29:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:29:06 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 10:34:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 10:34:19 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 17 10:35:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 10:35:02 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 17 10:36:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23308ad400, cur 1563384983 expire 1563384833 last 1563384756 Jul 17 10:36:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 10:37:18 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:37:18 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 17 10:39:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 10:39:02 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 17 10:39:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:39:07 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 17 10:45:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 10:45:03 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 17 10:45:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 10:45:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 10:49:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:49:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 10:51:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:51:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 10:51:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 10:55:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 10:55:08 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 17 10:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 10:55:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 10:55:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:57:42 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 10:59:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 10:59:48 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 17 11:04:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 11:04:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 11:05:32 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:05:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Jul 17 11:05:48 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 17 11:05:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 318b35ea-f34f-8f18-3990-714f3dfc1f43 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24af0bd000, cur 1563386756 expire 1563386606 last 1563386529 Jul 17 11:06:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 11:06:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 11:09:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 11:09:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 11:15:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 11:15:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 11:15:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 11:15:52 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 17 11:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 11:17:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 11:18:14 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:22:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:22:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 17 11:22:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 11:22:27 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 17 11:23:41 fir-md1-s1 kernel: Lustre: 23632:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f127d559800 x1637981944809296/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:16/0 lens 576/3264 e 1 to 0 dl 1563387826 ref 2 fl Interpret:/0/0 rc 0/0 Jul 17 11:26:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 11:26:31 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 17 11:27:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 11:27:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 11:28:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 11:28:07 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 17 11:28:08 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:28:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:28:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 17 11:30:15 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:30:15 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 17 11:32:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dabeaea3-ad90-b7ec-735a-adc59e227468 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2014f41800, cur 1563388351 expire 1563388201 last 1563388124 Jul 17 11:32:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 17 11:33:06 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:33:06 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jul 17 11:35:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 11:35:06 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 17 11:36:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 11:36:43 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 17 11:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 36eb4bb0-e598-816b-e640-ae0274342fad (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fe7e1ec00, cur 1563388624 expire 1563388474 last 1563388397 Jul 17 11:37:04 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 17 11:37:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 11:37:26 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 11:38:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:38:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 21 previous similar messages Jul 17 11:43:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 11:43:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 11:45:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 11:45:10 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 11:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jul 17 11:46:51 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 17 11:47:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 11:47:27 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 17 11:48:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:48:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 67 previous similar messages Jul 17 11:53:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3c075966-1b5c-a2cb-236c-ffc3f1e813d2 (at 10.9.107.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d44fcd400, cur 1563389620 expire 1563389470 last 1563389393 Jul 17 11:53:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 11:55:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 17 11:55:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 11:56:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 11:56:53 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 17 11:57:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 11:57:28 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 17 11:58:53 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 11:58:53 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 84 previous similar messages Jul 17 12:03:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 12:03:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 12:05:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 12:05:23 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 17 12:06:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 12:06:53 fir-md1-s1 kernel: Lustre: Skipped 177 previous similar messages Jul 17 12:07:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 12:07:35 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 17 12:08:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:08:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 102 previous similar messages Jul 17 12:13:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 12:13:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 12:16:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 12:16:48 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 17 12:16:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 12:16:56 fir-md1-s1 kernel: Lustre: Skipped 154 previous similar messages Jul 17 12:17:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 12:17:40 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 17 12:19:16 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:19:16 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 89 previous similar messages Jul 17 12:26:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 12:26:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 17 12:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 12:26:59 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 17 12:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 12:27:48 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 17 12:29:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:29:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 77 previous similar messages Jul 17 12:30:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 12:30:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 12:36:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 12:36:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 12:36:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 12:36:59 fir-md1-s1 kernel: Lustre: Skipped 175 previous similar messages Jul 17 12:37:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 12:37:50 fir-md1-s1 kernel: Lustre: Skipped 135 previous similar messages Jul 17 12:39:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:39:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 117 previous similar messages Jul 17 12:41:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 12:41:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 12:46:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 12:46:59 fir-md1-s1 kernel: Lustre: Skipped 193 previous similar messages Jul 17 12:47:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 12:47:04 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 17 12:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 12:47:52 fir-md1-s1 kernel: Lustre: Skipped 133 previous similar messages Jul 17 12:49:29 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:49:29 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 117 previous similar messages Jul 17 12:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bce998000, cur 1563393133 expire 1563392983 last 1563392906 Jul 17 12:52:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 12:52:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 12:52:59 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 12:57:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 12:57:01 fir-md1-s1 kernel: Lustre: Skipped 199 previous similar messages Jul 17 12:57:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 12:57:15 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 17 12:57:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 12:57:55 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 17 12:59:33 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 12:59:33 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 77 previous similar messages Jul 17 13:04:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 13:04:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 13:07:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 13:07:07 fir-md1-s1 kernel: Lustre: Skipped 168 previous similar messages Jul 17 13:08:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 13:08:00 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Jul 17 13:09:35 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 13:09:35 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 83 previous similar messages Jul 17 13:09:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 13:09:35 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 17 13:16:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 13:16:25 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 17 13:17:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 13:17:12 fir-md1-s1 kernel: Lustre: Skipped 225 previous similar messages Jul 17 13:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 13:18:06 fir-md1-s1 kernel: Lustre: Skipped 162 previous similar messages Jul 17 13:19:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 13:19:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 137 previous similar messages Jul 17 13:22:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 13:22:08 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 17 13:27:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 13:27:15 fir-md1-s1 kernel: Lustre: Skipped 220 previous similar messages Jul 17 13:28:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 13:28:11 fir-md1-s1 kernel: Lustre: Skipped 173 previous similar messages Jul 17 13:29:40 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 13:29:40 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 151 previous similar messages Jul 17 13:32:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 13:32:13 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 17 13:33:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 13:33:49 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 17 13:37:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 13:37:22 fir-md1-s1 kernel: Lustre: Skipped 143 previous similar messages Jul 17 13:38:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 13:38:12 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 17 13:39:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 13:39:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 50 previous similar messages Jul 17 13:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1aff394400, cur 1563396043 expire 1563395893 last 1563395816 Jul 17 13:45:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 13:45:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 17 13:46:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 13:46:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 13:47:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 13:47:31 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 17 13:48:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 13:48:25 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 17 13:49:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 13:49:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 69 previous similar messages Jul 17 13:55:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 13:55:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 13:57:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 13:57:38 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 17 13:58:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 13:58:33 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 17 14:00:16 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:00:16 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 56 previous similar messages Jul 17 14:01:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 14:06:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 14:06:20 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 17 14:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 14:07:40 fir-md1-s1 kernel: Lustre: Skipped 131 previous similar messages Jul 17 14:08:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 14:08:40 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 17 14:10:21 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:10:21 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 78 previous similar messages Jul 17 14:16:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 14:16:24 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 14:17:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 14:17:42 fir-md1-s1 kernel: Lustre: Skipped 160 previous similar messages Jul 17 14:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 14:18:45 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Jul 17 14:20:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 14:20:20 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 17 14:20:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:20:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 97 previous similar messages Jul 17 14:27:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 14:27:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 17 14:27:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 14:27:44 fir-md1-s1 kernel: Lustre: Skipped 166 previous similar messages Jul 17 14:28:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 14:28:49 fir-md1-s1 kernel: Lustre: Skipped 142 previous similar messages Jul 17 14:30:25 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:30:25 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 118 previous similar messages Jul 17 14:36:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 14:36:28 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 17 14:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 14:37:52 fir-md1-s1 kernel: Lustre: Skipped 186 previous similar messages Jul 17 14:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 14:38:51 fir-md1-s1 kernel: Lustre: Skipped 140 previous similar messages Jul 17 14:40:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:40:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 110 previous similar messages Jul 17 14:40:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 14:40:49 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 17 14:47:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 14:47:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 14:47:54 fir-md1-s1 kernel: Lustre: Skipped 153 previous similar messages Jul 17 14:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 14:48:53 fir-md1-s1 kernel: Lustre: Skipped 139 previous similar messages Jul 17 14:50:32 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 14:50:32 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 119 previous similar messages Jul 17 14:51:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 14:51:54 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 14:58:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 14:58:00 fir-md1-s1 kernel: Lustre: Skipped 172 previous similar messages Jul 17 14:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 14:59:01 fir-md1-s1 kernel: Lustre: Skipped 148 previous similar messages Jul 17 15:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 15:00:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 15:00:38 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:00:38 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 115 previous similar messages Jul 17 15:03:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 15:03:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 15:08:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 15:08:04 fir-md1-s1 kernel: Lustre: Skipped 193 previous similar messages Jul 17 15:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c1921cc00, cur 1563401339 expire 1563401189 last 1563401112 Jul 17 15:09:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 15:09:02 fir-md1-s1 kernel: Lustre: Skipped 153 previous similar messages Jul 17 15:10:43 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:10:43 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 127 previous similar messages Jul 17 15:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 15:11:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 15:13:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 15:13:41 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 17 15:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Jul 17 15:18:06 fir-md1-s1 kernel: Lustre: Skipped 217 previous similar messages Jul 17 15:19:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) reconnecting Jul 17 15:19:03 fir-md1-s1 kernel: Lustre: Skipped 162 previous similar messages Jul 17 15:21:07 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:21:07 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 124 previous similar messages Jul 17 15:23:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 15:23:59 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 17 15:27:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 15:27:06 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 17 15:28:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 15:28:07 fir-md1-s1 kernel: Lustre: Skipped 129 previous similar messages Jul 17 15:29:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 15:29:05 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 17 15:31:11 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:31:11 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 59 previous similar messages Jul 17 15:35:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 15:35:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 15:37:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 15:37:17 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 17 15:38:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 15:38:13 fir-md1-s1 kernel: Lustre: Skipped 157 previous similar messages Jul 17 15:39:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 15:39:14 fir-md1-s1 kernel: Lustre: Skipped 129 previous similar messages Jul 17 15:41:14 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:41:14 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 83 previous similar messages Jul 17 15:47:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 15:47:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 15:47:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 15:47:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 15:48:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jul 17 15:48:18 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 17 15:49:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 15:49:19 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 17 15:51:19 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 15:51:19 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 146 previous similar messages Jul 17 15:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 15:58:19 fir-md1-s1 kernel: Lustre: Skipped 218 previous similar messages Jul 17 15:58:31 fir-md1-s1 kernel: Lustre: 23623:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0698235700 x1639321679831200/t0(0) o101->39e76845-4976-21c9-38bb-bb738759d72c@10.9.0.64@o2ib4:6/0 lens 576/3264 e 1 to 0 dl 1563404316 ref 2 fl Interpret:/0/0 rc 0/0 Jul 17 15:59:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 15:59:23 fir-md1-s1 kernel: Lustre: Skipped 192 previous similar messages Jul 17 16:01:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 16:01:14 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 17 16:01:21 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:01:21 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 214 previous similar messages Jul 17 16:08:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 16:08:21 fir-md1-s1 kernel: Lustre: Skipped 218 previous similar messages Jul 17 16:09:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 16:09:24 fir-md1-s1 kernel: Lustre: Skipped 156 previous similar messages Jul 17 16:09:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 16:09:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 16:11:23 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:11:23 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 195 previous similar messages Jul 17 16:12:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 16:12:44 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 17 16:18:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Jul 17 16:18:26 fir-md1-s1 kernel: Lustre: Skipped 185 previous similar messages Jul 17 16:18:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 16:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 16:19:26 fir-md1-s1 kernel: Lustre: Skipped 166 previous similar messages Jul 17 16:21:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:21:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 206 previous similar messages Jul 17 16:27:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 16:27:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 16:28:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 16:28:27 fir-md1-s1 kernel: Lustre: Skipped 180 previous similar messages Jul 17 16:29:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 16:29:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 16:29:28 fir-md1-s1 kernel: Lustre: Skipped 163 previous similar messages Jul 17 16:31:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:31:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 207 previous similar messages Jul 17 16:38:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Jul 17 16:38:28 fir-md1-s1 kernel: Lustre: Skipped 198 previous similar messages Jul 17 16:39:12 fir-md1-s1 kernel: Lustre: 21567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1509cf9450 x1638242669475296/t0(0) o3->b74b4b66-65f0-f951-331c-463b7f96e033@10.9.0.62@o2ib4:17/0 lens 488/4536 e 1 to 0 dl 1563406757 ref 2 fl Interpret:/0/0 rc 0/0 Jul 17 16:39:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Jul 17 16:39:37 fir-md1-s1 kernel: Lustre: Skipped 158 previous similar messages Jul 17 16:41:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 16:41:07 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 17 16:41:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:41:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 168 previous similar messages Jul 17 16:42:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 16:42:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 16:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 16:48:30 fir-md1-s1 kernel: Lustre: Skipped 135 previous similar messages Jul 17 16:49:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) reconnecting Jul 17 16:49:38 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 17 16:51:51 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 16:51:51 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 97 previous similar messages Jul 17 16:52:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 16:52:28 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 17 16:55:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 16:55:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 16:58:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 16:58:36 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 17 16:59:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 16:59:45 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 17 17:02:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 17:02:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Jul 17 17:03:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 17:03:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 17:05:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 17:05:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 17:08:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 17:08:37 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 17:10:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 17:10:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 17:14:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 17:14:17 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 17:18:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 17:18:38 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 17 17:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 17:20:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 17 17:20:49 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 17 17:20:49 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Jul 17 17:24:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 17 17:24:45 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 17:26:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 17:28:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 17:28:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 17:29:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 17:29:28 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 17 17:30:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 17:30:40 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 17:38:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 17:38:47 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 17 17:39:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 17:39:53 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 17 17:40:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 17:40:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 17 17:40:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 17:48:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 17:48:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 17:49:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 17:49:36 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 17 17:49:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 17:49:54 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 17 17:50:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 17:50:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 17:59:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 17:59:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 17 17:59:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 17:59:59 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 17 18:01:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 18:01:38 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 17 18:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3b8583db-9b25-3a84-4b44-4c626faa0d2b (at 10.8.30.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45018df000, cur 1563412171 expire 1563412021 last 1563411944 Jul 17 18:10:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 18:10:03 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 17 18:12:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 18:12:16 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 18:13:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 18:13:45 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 17 18:19:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 18:19:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 18:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 18:20:41 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 17 18:21:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 18:21:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 18:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 18:22:29 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 18:25:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 18:25:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 17 18:27:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 18:27:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 18:31:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 18:31:06 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 17 18:32:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 18:32:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 18:34:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 18:37:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 18:37:55 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 17 18:41:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 18:41:08 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 17 18:42:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 18:42:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 17 18:44:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 18:44:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 18:48:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 18:48:01 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 18:51:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 18:51:15 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 17 18:52:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 18:52:50 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 18:59:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 18:59:06 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 17 19:00:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 19:01:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 19:01:21 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 17 19:03:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 19:03:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 19:10:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 19:10:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 19:11:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 19:11:45 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 17 19:14:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 19:14:03 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 19:19:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 19:19:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 17 19:21:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 17 19:21:48 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 17 19:24:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 19:24:00 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 17 19:24:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 19:24:18 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 19:32:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 19:32:06 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 17 19:33:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 19:33:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 17 19:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 19:34:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 17 19:35:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 19:35:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 19:42:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 19:42:09 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 19:44:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 19:44:58 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 19:45:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 19:45:44 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 19:52:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 19:52:18 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 17 19:55:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 19:55:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 17 19:55:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 19:56:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 17 19:56:24 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 17 20:02:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 20:02:27 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 17 20:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 20:05:32 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 20:06:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 20:06:30 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 17 20:12:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 20:12:39 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 17 20:15:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 20:16:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 20:16:31 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 17 20:16:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 20:16:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 20:21:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 55958873-d9dc-b883-9e11-ee3acb1552f7 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d8cabc00, cur 1563420077 expire 1563419927 last 1563419850 Jul 17 20:21:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 20:21:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4690ff61-50bc-9dea-7542-9d11ccee3209 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f153c1bcc00, cur 1563420090 expire 1563419940 last 1563419863 Jul 17 20:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4690ff61-50bc-9dea-7542-9d11ccee3209 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453bdf4c00, cur 1563420095 expire 1563419945 last 1563419868 Jul 17 20:22:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 20:22:40 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 17 20:26:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 20:26:43 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 20:26:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 20:26:46 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 20:27:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 20:28:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 20:31:26 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 17 20:31:26 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 17 20:32:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 20:32:42 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 20:37:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 20:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 20:37:03 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 17 20:37:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 20:42:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 17 20:42:55 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 17 20:47:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 20:47:10 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 20:47:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 20:47:42 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 20:52:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 20:52:56 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 17 20:57:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 17 20:57:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 17 20:58:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 20:58:15 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 21:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 21:03:13 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 17 21:07:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 21:07:32 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 17 21:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 21:08:36 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 21:13:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 21:13:17 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 17 21:17:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 21:17:32 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 17 21:18:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 21:19:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 21:19:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 21:23:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 21:23:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 21:23:45 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 17 21:24:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 21:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 21:29:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 17 21:30:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 21:30:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 17 21:34:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 21:34:01 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 17 21:34:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 21:36:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 21:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 21:39:54 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 21:40:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 45297336-15eb-ba8c-1681-805168566731 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e9be3b400, cur 1563424834 expire 1563424684 last 1563424607 Jul 17 21:41:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 21:41:43 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 17 21:44:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 21:44:17 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 17 21:50:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 21:50:15 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 17 21:51:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1247d0dc00, cur 1563425519 expire 1563425369 last 1563425292 Jul 17 21:51:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 21:52:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 21:52:19 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 17 21:54:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 21:54:34 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 21:55:45 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 17 21:55:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dbfe3fcc-693c-e83c-dfc7-3d728fe80694 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34a8d54c00, cur 1563425756 expire 1563425606 last 1563425529 Jul 17 21:59:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:00:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 22:00:28 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 22:02:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 22:02:25 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 17 22:04:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 22:04:36 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 17 22:09:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:10:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 22:10:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 22:10:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:11:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:11:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:12:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:12:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 22:12:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 17 22:15:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 22:15:32 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 17 22:20:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 22:20:39 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 17 22:22:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 22:22:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 22:25:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 22:25:35 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 17 22:30:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 22:30:51 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 22:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 22:32:39 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 22:34:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:35:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2095b8a800, cur 1563428115 expire 1563427965 last 1563427888 Jul 17 22:35:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 22:35:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 22:35:36 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 17 22:39:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e1197ed2-4015-b05b-9996-2925a72ba8c9 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f29f55bec00, cur 1563428345 expire 1563428195 last 1563428118 Jul 17 22:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 22:40:52 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 17 22:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 17 22:45:43 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 17 22:46:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 17 22:46:36 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 17 22:49:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:50:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 22:51:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 17 22:51:18 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 17 22:54:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c98b7c3b-2dd3-dd94-3956-3c9eddd0473f (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e03e64400, cur 1563429297 expire 1563429147 last 1563429070 Jul 17 22:54:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 22:55:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Jul 17 22:55:47 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 17 22:57:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 22:57:18 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 17 22:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 17 23:03:05 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 17 23:05:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 23:05:50 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 17 23:09:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 23:09:24 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 17 23:10:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 096e9fd5-793c-068e-aaeb-9b51f1824475 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f348bfc4c00, cur 1563430239 expire 1563430089 last 1563430012 Jul 17 23:10:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 23:12:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:13:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 23:13:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 17 23:15:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 17 23:15:56 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 17 23:19:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 23:19:27 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 17 23:22:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:23:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 17 23:23:46 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 17 23:23:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 373b4b4a-d804-db20-0066-c55c63f43da2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24b6ea2400, cur 1563431033 expire 1563430883 last 1563430806 Jul 17 23:23:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 17 23:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 373b4b4a-d804-db20-0066-c55c63f43da2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2538ee1400, cur 1563431053 expire 1563430903 last 1563430826 Jul 17 23:25:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 23:25:58 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 17 23:29:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 23:29:41 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 17 23:34:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 23:34:09 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 23:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 23:36:01 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 17 23:36:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:39:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 23:39:45 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 17 23:41:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client db7e1012-8983-e0ae-379c-f9ec6e63dfc3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23dbf5bc00, cur 1563432247 expire 1563432097 last 1563432020 Jul 17 23:44:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 17 23:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 17 23:44:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 17 23:46:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 23:46:52 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 17 23:48:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:49:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 17 23:50:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 17 23:50:18 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 17 23:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 17 23:54:31 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 17 23:56:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 17 23:56:57 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 17 23:57:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:00:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:00:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 18 00:01:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:04:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:05:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 00:05:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 00:05:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 00:07:01 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 18 00:09:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:11:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:11:06 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 00:12:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:12:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 00:15:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 00:15:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 00:17:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 00:17:04 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 18 00:24:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:24:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 00:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 00:25:36 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 18 00:26:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:26:13 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 00:27:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 00:27:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 00:32:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8e32682d-32de-951d-fbd3-5ec1098902ed (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22442a5800, cur 1563435122 expire 1563434972 last 1563434895 Jul 18 00:32:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 00:35:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 00:35:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 00:36:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:36:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 00:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 00:37:11 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 00:42:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:45:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 13bcf1e5-aff5-9453-b84c-ec8d544b7fa5 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2dfc309400, cur 1563435950 expire 1563435800 last 1563435723 Jul 18 00:45:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 00:46:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 00:46:14 fir-md1-s1 kernel: Lustre: Skipped 12826 previous similar messages Jul 18 00:47:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:47:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 00:47:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 00:47:14 fir-md1-s1 kernel: Lustre: Skipped 12852 previous similar messages Jul 18 00:55:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 00:55:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 00:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 00:56:21 fir-md1-s1 kernel: Lustre: Skipped 5742 previous similar messages Jul 18 00:57:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 00:57:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 00:57:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 00:57:18 fir-md1-s1 kernel: Lustre: Skipped 5774 previous similar messages Jul 18 01:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9385423e-927c-ff58-9474-f3921e39ddff (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33aa0a4000, cur 1563437023 expire 1563436873 last 1563436796 Jul 18 01:03:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 01:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 01:06:29 fir-md1-s1 kernel: Lustre: Skipped 8946 previous similar messages Jul 18 01:07:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 01:07:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 01:07:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 01:07:21 fir-md1-s1 kernel: Lustre: Skipped 8988 previous similar messages Jul 18 01:08:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 01:08:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 01:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 01:17:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 01:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 01:17:28 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 18 01:17:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 01:17:47 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 18 01:21:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 01:21:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 01:21:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4420672-3b62-ca85-71df-6fc1399a0760 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2830581800, cur 1563438064 expire 1563437914 last 1563437837 Jul 18 01:21:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 01:27:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 01:27:22 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 01:27:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 01:27:38 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 01:30:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 01:30:26 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 01:33:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 01:33:34 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 01:37:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b315c279-877c-05de-2b72-c6cd2c0f5c96 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2101f10c00, cur 1563439045 expire 1563438895 last 1563438818 Jul 18 01:37:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 01:37:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 01:37:49 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 01:37:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 01:37:49 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 18 01:40:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 18 01:40:39 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 18 01:46:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 01:46:09 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 18 01:48:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 01:48:10 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 01:48:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 01:48:10 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 18 01:50:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 01:50:43 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 01:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 01:58:19 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 01:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 01:58:19 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 02:02:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 02:02:36 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 18 02:02:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 02:02:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 02:08:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 02:08:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 02:08:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 02:08:26 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 02:12:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 02:12:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 18 02:16:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 02:16:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 02:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 02:18:37 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 02:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 02:18:37 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 18 02:23:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 02:23:39 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 18 02:24:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 05ceaeb5-892d-4480-f97f-b0f7e123f035 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f212187f800, cur 1563441897 expire 1563441747 last 1563441670 Jul 18 02:24:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 02:28:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 02:28:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 02:28:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 02:28:53 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 02:33:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 02:33:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 02:35:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 02:35:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 02:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2e624a59-45ce-1992-3707-e08bbd191d11 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c6836d800, cur 1563442635 expire 1563442485 last 1563442408 Jul 18 02:37:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 02:39:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 02:39:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 02:39:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 02:39:00 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 18 02:45:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 18 02:45:36 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 02:49:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 02:49:41 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 02:49:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 02:49:45 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 02:51:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 02:51:35 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 02:55:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 66cc070c-277e-d72c-df2b-26a2a3dd30d4 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27581e8800, cur 1563443747 expire 1563443597 last 1563443520 Jul 18 02:55:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 02:55:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 18 02:55:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 02:57:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a48d67d8-7a44-4e09-9f14-23be646295c6 (at 10.8.23.14@o2ib6) in 200 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef180e000, cur 1563443823 expire 1563443673 last 1563443623 Jul 18 02:57:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 02:57:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 4dea06d6-e683-afaf-c4f1-01947c7c4e94 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34c2f8b400, cur 1563443850 expire 1563443700 last 1563443623 Jul 18 02:57:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 02:58:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 152 seconds. I think it's dead, and I am evicting it. exp ffff8f340f348000, cur 1563443926 expire 1563443776 last 1563443774 Jul 18 02:59:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 02:59:55 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 02:59:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 02:59:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 18 03:07:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 03:07:16 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 03:09:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 03:09:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 03:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 03:10:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 03:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 03:10:00 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 03:18:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 03:18:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 18 03:20:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 03:20:01 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 03:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 03:20:21 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 03:20:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 03:20:58 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 03:28:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b34e63b3-c410-4c67-de9c-e57f38c21974 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f228339ac00, cur 1563445683 expire 1563445533 last 1563445456 Jul 18 03:28:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 03:28:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 03:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 03:30:32 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 03:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 03:30:32 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 03:33:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 03:33:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 03:34:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 03:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aea2fc3f-f9fa-591e-4f76-76f57c7e6ec4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fa3e94c00, cur 1563446158 expire 1563446008 last 1563445931 Jul 18 03:35:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 03:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 03:40:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 03:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 03:40:43 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 03:41:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 03:41:31 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 03:48:00 fir-md1-s1 kernel: Lustre: 21433:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563446873/real 1563446873] req@ffff8f17ecdbbc00 x1636737030521376/t0(0) o104->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563446880 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 03:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 03:50:51 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 03:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 03:50:51 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 18 03:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64f30b86-a60b-9dab-9c1a-695c0e3b7108 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1696bac000, cur 1563447182 expire 1563447032 last 1563446955 Jul 18 03:53:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 03:53:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 03:53:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 03:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a8281b7d-e049-6940-3022-cd6b4f5c67e7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1aa890c800, cur 1563447397 expire 1563447247 last 1563447170 Jul 18 03:56:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 04:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 04:00:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 04:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 04:00:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 04:03:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 04:03:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 04:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f87d4b64-4413-b8bc-2d2f-da79055e990d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3501578400, cur 1563447961 expire 1563447811 last 1563447734 Jul 18 04:06:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 04:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f87d4b64-4413-b8bc-2d2f-da79055e990d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2312923c00, cur 1563447973 expire 1563447823 last 1563447746 Jul 18 04:06:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 04:10:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 04:10:54 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 18 04:10:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 04:10:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 04:14:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:14:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 04:16:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:17:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 04:17:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 04:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 04:21:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 04:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 04:21:08 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 18 04:21:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:28:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 04:28:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 04:28:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9da0e5b9-2fb2-6ee8-616c-015f030d7435 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bcfeaf000, cur 1563449314 expire 1563449164 last 1563449087 Jul 18 04:30:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:30:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 04:31:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 04:31:28 fir-md1-s1 kernel: Lustre: Skipped 95878 previous similar messages Jul 18 04:31:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 04:31:28 fir-md1-s1 kernel: Lustre: Skipped 95907 previous similar messages Jul 18 04:34:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c24b8bff-f99c-4849-767d-bb11ab7dd32c (at 10.9.104.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453f221000, cur 1563449667 expire 1563449517 last 1563449440 Jul 18 04:34:27 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 18 04:39:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 04:39:44 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 04:40:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fa351677-2bb5-9948-9028-7cf7d63b5587 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30d49f5800, cur 1563450022 expire 1563449872 last 1563449795 Jul 18 04:40:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 04:40:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:40:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 04:41:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 04:41:31 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 04:41:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 04:41:47 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 04:49:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 04:49:50 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 18 04:50:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client babb0fda-5d79-61dd-ba92-6b619b7e8da0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31b5aa8c00, cur 1563450604 expire 1563450454 last 1563450377 Jul 18 04:50:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 04:52:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 04:52:01 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 18 04:52:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 04:52:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 04:53:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 04:53:42 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 18 05:00:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 05:00:18 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 05:02:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 05:02:02 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 18 05:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 05:02:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 05:04:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 05:04:45 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 05:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34bdb84400, cur 1563451730 expire 1563451580 last 1563451503 Jul 18 05:08:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 05:11:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 05:11:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 05:11:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33f7c1b400, cur 1563451910 expire 1563451760 last 1563451683 Jul 18 05:12:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 05:12:08 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 05:12:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 05:12:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 05:15:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 05:15:56 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 05:21:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 05:21:45 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 05:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e8474e000, cur 1563452518 expire 1563452368 last 1563452291 Jul 18 05:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 05:22:10 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 18 05:22:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 05:22:36 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 05:27:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 05:27:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 05:32:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 05:32:32 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 18 05:32:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 05:32:39 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 05:33:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 05:33:36 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 05:37:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 05:37:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 05:39:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 61bcd8e0-2331-0505-86a7-0248288f7bc7 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d4dbd2400, cur 1563453581 expire 1563453431 last 1563453354 Jul 18 05:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 05:42:37 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 18 05:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 05:42:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 05:46:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 18 05:46:04 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 05:52:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 05:52:41 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 18 05:53:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 05:53:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 05:53:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 05:53:28 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 05:57:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 05:57:06 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 18 06:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 06:02:43 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 06:03:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 06:03:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 06:07:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:07:07 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 06:07:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 06:12:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 06:12:47 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 18 06:13:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 06:13:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 06:17:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:17:12 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 06:23:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 06:23:23 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 18 06:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 06:23:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 18 06:24:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 06:24:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 06:28:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:28:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 06:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 06:34:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 06:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 06:34:00 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 06:36:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 06:36:47 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 06:38:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:38:19 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 06:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 06:44:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 06:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 06:44:13 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 18 06:48:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:48:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 06:50:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 06:50:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 06:54:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 06:54:17 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 18 06:54:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 06:54:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 06:59:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 06:59:04 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 07:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:00:35 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 07:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 07:04:31 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 18 07:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 07:04:48 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 18 07:09:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 07:09:26 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 18 07:14:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 07:14:36 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 18 07:14:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 07:14:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 07:19:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 07:19:26 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 18 07:23:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:23:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 07:24:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 07:24:39 fir-md1-s1 kernel: Lustre: Skipped 125 previous similar messages Jul 18 07:25:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 07:25:14 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 18 07:26:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:26:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 07:29:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 07:29:30 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 18 07:34:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 07:34:39 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 07:35:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:35:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 07:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 07:35:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 07:40:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 07:40:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 07:44:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:44:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 07:44:42 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 18 07:46:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 07:46:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 07:51:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 07:51:01 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 07:54:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 07:54:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 07:55:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 07:55:26 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 07:56:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 07:56:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 08:01:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 08:01:22 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 08:06:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 08:06:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 08:06:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 08:06:30 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 18 08:07:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 08:07:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 18 08:11:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 08:11:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 08:16:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 08:16:33 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 18 08:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 08:17:27 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 08:19:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 08:19:09 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 08:21:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 08:21:40 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 18 08:26:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 08:26:40 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 18 08:27:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 08:27:56 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 08:32:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 08:32:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 08:32:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 08:32:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 08:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24a0c73800, cur 1563464227 expire 1563464077 last 1563464000 Jul 18 08:37:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 08:38:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 08:38:04 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 18 08:38:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 08:38:04 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 18 08:43:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 08:43:02 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 08:48:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 08:48:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 08:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 08:48:34 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 08:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 08:48:34 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 18 08:53:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 08:53:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 18 08:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 08:58:38 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 08:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 08:58:38 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 08:59:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 08:59:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 09:05:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ed048fc00, cur 1563465937 expire 1563465787 last 1563465710 Jul 18 09:06:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 09:06:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 18 09:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 09:09:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 09:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 09:09:25 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 18 09:12:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 09:12:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 09:16:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 09:16:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 09:19:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 09:19:31 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 18 09:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 09:19:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 09:27:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 09:27:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 09:29:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 09:29:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 09:29:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 09:29:32 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 18 09:29:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 09:29:43 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 09:39:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 09:39:34 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 18 09:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 09:39:48 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 18 09:40:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 09:40:13 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 09:40:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2168f86000, cur 1563468022 expire 1563467872 last 1563467795 Jul 18 09:40:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 09:40:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 09:49:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 09:49:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 09:49:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 09:49:50 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 18 09:50:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 09:50:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 09:52:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 09:52:48 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 18 10:00:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 10:00:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 10:00:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 10:00:00 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 10:01:26 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0e97b98300 x1631582122543584/t0(0) o101->fb8f22c1-ceb3-fa39-aea4-695a494d32c5@10.9.101.26@o2ib4:1/0 lens 576/3264 e 0 to 0 dl 1563469291 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 10:01:26 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 18 10:03:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 10:03:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 10:06:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 10:06:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 10:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 10:10:13 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 10:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 10:10:13 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 10:13:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 10:13:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 10:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 10:20:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 10:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 10:20:24 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 18 10:22:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 10:22:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 10:24:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 10:24:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 10:30:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 10:30:41 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 10:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 10:31:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 18 10:35:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 10:35:28 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 10:37:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 10:37:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 10:40:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 10:40:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 18 10:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 10:41:11 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 10:46:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 10:46:04 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 10:50:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 10:50:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 10:51:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 10:51:00 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 10:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b2ea4b62-1b0e-1f82-5376-2b6f23c901d4 (at 10.8.26.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efcfdc00, cur 1563472283 expire 1563472133 last 1563472056 Jul 18 10:51:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 10:51:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 10:58:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 18 10:58:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 11:00:18 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 11:01:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 11:01:01 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 11:02:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 11:02:05 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 11:02:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:02:42 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 11:08:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 11:08:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 11:11:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 11:11:22 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 18 11:12:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 11:12:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 11:14:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:14:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 11:20:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 11:20:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 11:21:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 11:21:38 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 18 11:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 11:22:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 11:24:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:24:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 11:30:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 11:30:28 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 18 11:31:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 11:31:42 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 18 11:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 11:33:03 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 11:34:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:34:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 11:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 11:41:26 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 11:41:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 11:41:48 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 18 11:43:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 11:43:19 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 18 11:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 94a353a2-2d97-c41a-db14-ad2e5d4fcfab (at 10.8.27.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3402202800, cur 1563475418 expire 1563475268 last 1563475191 Jul 18 11:43:38 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 18 11:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 94a353a2-2d97-c41a-db14-ad2e5d4fcfab (at 10.8.27.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16faa75400, cur 1563475428 expire 1563475278 last 1563475201 Jul 18 11:43:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 11:45:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:45:32 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 18 11:51:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 11:51:54 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 11:52:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 11:52:27 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 11:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 11:53:42 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 11:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 11:59:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 12:01:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 12:01:59 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 18 12:03:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 12:03:47 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 12:04:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 12:04:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 12:11:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:11:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 12:12:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 12:12:12 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 18 12:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 12:13:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 18 12:14:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 12:14:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 12:21:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:21:54 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 18 12:22:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 12:22:23 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 18 12:24:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 12:24:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 12:25:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 12:25:35 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 18 12:32:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 12:32:41 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 18 12:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 12:34:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 12:36:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 18 12:36:13 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 18 12:42:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 12:42:42 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 18 12:44:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 12:44:19 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 12:46:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 12:46:35 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 18 12:46:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:46:57 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 12:49:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:52:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 12:52:48 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 18 12:53:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:53:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 12:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 12:54:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 12:56:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 12:56:54 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 12:59:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 12:59:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 13:02:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 13:02:51 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 18 13:04:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 13:04:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 13:07:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 13:07:09 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 18 13:12:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 13:12:48 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 13:12:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 13:12:54 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 13:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 13:15:57 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 13:17:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 13:17:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 18 13:23:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 13:23:11 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 18 13:24:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 13:24:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 13:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 13:26:34 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 18 13:28:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 18 13:28:09 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 13:33:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 13:33:23 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 13:35:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 13:35:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 13:36:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 13:36:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 13:38:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 13:38:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 18 13:43:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 13:43:34 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 13:46:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 13:46:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 13:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 13:47:00 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 13:53:18 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33065ae400, cur 1563483198 expire 1563483048 last 1563482971 Jul 18 13:53:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 18 13:53:44 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 18 13:53:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 13:53:44 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 18 13:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 13:57:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 14:03:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 14:03:50 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 14:03:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 14:03:50 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 18 14:07:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 14:07:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 14:09:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:09:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 14:10:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:13:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 14:13:52 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 14:14:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 14:14:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 14:15:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 14:17:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 14:19:06 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b9dcfaa00 x1639321849112304/t0(0) o101->39e76845-4976-21c9-38bb-bb738759d72c@10.9.0.64@o2ib4:11/0 lens 584/3264 e 1 to 0 dl 1563484751 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 14:23:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 14:23:58 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 18 14:24:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 14:24:17 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 18 14:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 14:27:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 14:34:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:34:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 14:34:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 14:34:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 14:34:41 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 14:34:41 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 14:35:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:35:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 14:38:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 14:38:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 14:39:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 14:44:50 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 14:46:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 14:46:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 14:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 14:48:15 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 14:51:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:51:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 14:55:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 14:55:02 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 14:56:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 14:56:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 14:58:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 14:58:32 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 18 14:58:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 14:58:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 15:05:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 15:05:04 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 15:06:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 15:06:56 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 18 15:09:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 15:09:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 15:14:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 15:15:34 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 15:15:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 15:15:34 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 18 15:17:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 15:17:24 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 18 15:21:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 15:21:38 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 15:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75eaf15b-50e1-beaf-6fdb-223b2d2cc5ee (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1796aabc00, cur 1563488504 expire 1563488354 last 1563488277 Jul 18 15:21:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0b00c403-b81b-3090-2417-d0ea5f338239 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f270de59400, cur 1563488518 expire 1563488368 last 1563488291 Jul 18 15:22:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75eaf15b-50e1-beaf-6fdb-223b2d2cc5ee (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ea5f57000, cur 1563488528 expire 1563488378 last 1563488301 Jul 18 15:22:08 fir-md1-s1 kernel: LustreError: 20384:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2d77d5d100 x1636737479814544/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 15:25:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 15:25:36 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 18 15:27:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 15:27:34 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 18 15:31:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 15:31:58 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 15:35:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 15:35:41 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 18 15:39:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 15:39:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 18 15:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 15:42:11 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 15:44:51 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 15:45:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 15:45:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 15:46:20 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 15:50:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 15:50:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 15:50:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ca99bf8b-e767-dd48-4e74-d2bdf113e87a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f172be51c00, cur 1563490232 expire 1563490082 last 1563490005 Jul 18 15:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 15:52:34 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 15:54:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 15:54:00 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 18 15:56:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 15:56:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 16:02:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:02:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 16:02:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 16:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 16:06:17 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 18 16:06:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:07:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:07:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 16:07:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 16:08:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 16:13:41 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 18 16:16:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 16:16:21 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 18 16:18:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:21:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 16:21:32 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 18 16:24:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 16:24:02 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 18 16:25:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:25:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 16:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 16:26:22 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 18 16:31:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 16:31:38 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 16:32:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:32:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 16:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 16:34:18 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 16:36:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 16:36:28 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 18 16:41:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 16:41:44 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 18 16:44:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 16:44:18 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 16:44:31 fir-md1-s1 kernel: Lustre: 23627:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a6cca8000 x1631690217745824/t0(0) o101->2662ab69-aec7-2e87-a084-2a8522884959@10.8.22.22@o2ib6:6/0 lens 480/568 e 1 to 0 dl 1563493476 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 16:45:07 fir-md1-s1 kernel: Lustre: 10305:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b11c88000 x1631568828518368/t0(0) o101->6818c063-a70b-0d7a-5ae5-0dc447ff5658@10.9.105.14@o2ib4:12/0 lens 480/568 e 1 to 0 dl 1563493512 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 16:45:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.67@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:45:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.67@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 16:45:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.22@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ba2e6cc80/0x5d9ee65fbf555bd4 lrc: 3/0,0 mode: PW/PW res: [0x200029aa0:0x8:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.22.22@o2ib6 remote: 0xf80ab1d043a8d604 expref: 35 pid: 23749 timeout: 2608581 lvb_type: 0 Jul 18 16:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 16:46:30 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 18 16:50:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d0126da-1b2a-d37f-f2a7-3619ddfecc8a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31f3b8c400, cur 1563493857 expire 1563493707 last 1563493630 Jul 18 16:50:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 16:51:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 16:51:53 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 16:54:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 16:54:20 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 18 16:56:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 16:56:31 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 16:56:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a5a22d61-c231-21ac-f779-f64b9fce4eea (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fb365c00, cur 1563494213 expire 1563494063 last 1563493986 Jul 18 16:56:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 16:59:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 17:01:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:01:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 17:02:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 40d87a73-8e6e-4939-03c9-b59777357654 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb2d25c00, cur 1563494534 expire 1563494384 last 1563494307 Jul 18 17:02:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 17:04:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 17:04:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 17:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 17:06:42 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 18 17:12:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:12:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 18 17:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 17:14:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 17:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 17:16:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 18 17:18:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 17:22:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:22:03 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 18 17:24:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 17:24:45 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 17:27:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 17:27:01 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 17:32:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:32:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 17:32:56 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 17:35:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 17:35:03 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 17:36:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 17:37:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 17:37:16 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 17:42:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:42:51 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 17:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 17:45:53 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 17:47:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 17:47:39 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 18 17:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f3ba0db6-e5d3-27c6-db2c-40a1cea89d24 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3427b35000, cur 1563497402 expire 1563497252 last 1563497175 Jul 18 17:50:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 17:52:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 17:52:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 17:53:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 17:53:01 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 18 17:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 17:55:57 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 18 17:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 17:57:43 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 17:58:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 66e1dd6d-0af2-cf8d-543b-4553b25c00db (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19774aec00, cur 1563497882 expire 1563497732 last 1563497655 Jul 18 17:58:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 18:03:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:03:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 18 18:05:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 18:05:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 18:07:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 18:07:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 18:07:51 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 18 18:13:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:13:21 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 18:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3ed8f8c3-5dc1-73d3-f4b6-d4e98e0ef06f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f0f9be000, cur 1563498931 expire 1563498781 last 1563498704 Jul 18 18:15:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 18:16:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 18:16:10 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 18:17:04 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 18:17:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 18:17:57 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 18 18:22:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 18:22:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 18:24:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:24:31 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 18 18:26:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 18:26:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 18:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 18:27:59 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 18:30:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 40fb26bb-24fc-f29a-071e-70b7513cfd13 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f288cbf6400, cur 1563499801 expire 1563499651 last 1563499574 Jul 18 18:30:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 18:33:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 18:33:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 18:35:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:35:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 18:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 18:36:22 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 18 18:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 18:38:08 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 18 18:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f457fcfd400, cur 1563500618 expire 1563500468 last 1563500391 Jul 18 18:43:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 18:45:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:45:46 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 18 18:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 18:46:37 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 18:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 46351329-fac3-e638-911a-3ad65de5f370 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f1a722400, cur 1563500857 expire 1563500707 last 1563500630 Jul 18 18:47:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 18:48:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 18:48:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 18:48:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 18:48:19 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 18 18:51:32 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 18:55:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 18:55:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 18:56:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 18:56:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 18:58:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 18:58:31 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 18 18:58:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 18:58:32 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 18 19:03:15 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.109.20@o2ib4 arrived at 1563501795 with bad export cookie 6746082289091017751 Jul 18 19:03:15 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 2577 previous similar messages Jul 18 19:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 19:06:45 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 19:07:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0979247000, cur 1563502022 expire 1563501872 last 1563501795 Jul 18 19:07:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 19:07:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 19:07:18 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 18 19:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 19:08:36 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 19:10:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 19:10:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 19:16:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 19:16:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 19:17:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 19:17:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 19:18:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 496df2b3-3e2c-804b-dd74-f148cf208229 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31f6f03400, cur 1563502685 expire 1563502535 last 1563502458 Jul 18 19:18:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 19:18:44 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 18 19:21:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 19:21:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 19:24:17 fir-md1-s1 kernel: LustreError: 48114:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.109.20@o2ib4 arrived at 1563503057 with bad export cookie 6746082289091017758 Jul 18 19:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 19:26:49 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 19:27:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 19:27:26 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 19:28:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f363665a000, cur 1563503284 expire 1563503134 last 1563503057 Jul 18 19:28:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 19:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 19:28:58 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 18 19:33:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 19:35:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 846184af-9ef3-d9f7-25ba-34c7a5643b4f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27699ac400, cur 1563503735 expire 1563503585 last 1563503508 Jul 18 19:36:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 19:36:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 19:39:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 19:39:11 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 19:39:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 19:39:41 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 18 19:44:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 19:44:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 19:47:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 19:47:07 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 19:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 19:51:04 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 19:51:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 18 19:51:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 19:56:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 19:56:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 19:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 19:57:18 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 20:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 20:01:07 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 18 20:01:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:01:13 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 20:05:53 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 20:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 384eb7cc-d6a6-ae74-3ab3-8c86244e6624 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c19c93400, cur 1563505574 expire 1563505424 last 1563505347 Jul 18 20:06:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 20:07:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 20:07:22 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 20:09:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 20:09:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 20:11:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:11:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 20:11:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 20:11:16 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 18 20:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f38f5eeb400, cur 1563506221 expire 1563506071 last 1563505994 Jul 18 20:17:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 20:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4526390000, cur 1563506238 expire 1563506088 last 1563506011 Jul 18 20:17:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 20:17:55 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 20:21:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 20:21:24 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 20:21:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 20:21:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 18 20:22:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:22:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 20:23:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 50394ead-5e3b-e2cc-70e6-1e1b6994d2fd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31aabab000, cur 1563506594 expire 1563506444 last 1563506367 Jul 18 20:24:14 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 20:26:48 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 20:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 20:27:59 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 20:28:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45188a2400, cur 1563506920 expire 1563506770 last 1563506693 Jul 18 20:28:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 20:31:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 20:31:24 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 20:33:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 20:33:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 20:34:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:34:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 18 20:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 20:38:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 18 20:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 20:41:25 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 18 20:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a1a6e373-0024-8708-0988-62be8fe6cdf4 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fa37bf800, cur 1563507744 expire 1563507594 last 1563507517 Jul 18 20:43:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f03cd3c8-c9e6-7855-a629-d420925fba3b (at 10.8.23.14@o2ib6) in 210 seconds. I think it's dead, and I am evicting it. exp ffff8f34fce85000, cur 1563507820 expire 1563507670 last 1563507610 Jul 18 20:43:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 20:44:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:44:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 20:48:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 20:48:38 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 20:51:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 20:51:46 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 18 20:52:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 20:52:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 20:55:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 20:55:50 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 20:58:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 20:58:44 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 18 21:00:53 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:00:53 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 18 21:00:54 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:00:54 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f244a95e900 x1631606284609248/t425994642063(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:29/0 lens 488/3152 e 1 to 0 dl 1563508859 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:00:57 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:00:59 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:00:59 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 18 21:01:01 fir-md1-s1 kernel: LustreError: 46520:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f268ad0e050 x1631606284705248/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:28/0 lens 488/440 e 0 to 0 dl 1563508888 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:03 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:01:03 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 18 21:01:03 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f405b6e5200 Jul 18 21:01:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6), client will retry: rc -110 Jul 18 21:01:05 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f28abc5f450 x1638082378256544/t0(0) o4->0eaeb89b-859f-1fc8-d1f0-672563c1d160@10.8.8.24@o2ib6:3/0 lens 488/448 e 0 to 0 dl 1563508893 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:05 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 18 21:01:05 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508858/real 0] req@ffff8f243aa57b00 x1636737669221936/t0(0) o104->fir-MDT0002@10.8.7.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563508865 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:06 fir-md1-s1 kernel: Lustre: 97669:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508859/real 0] req@ffff8f246cac6000 x1636737669229328/t0(0) o106->fir-MDT0000@10.8.27.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563508866 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:06 fir-md1-s1 kernel: Lustre: 97669:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 18 21:01:07 fir-md1-s1 kernel: Lustre: 23730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508860/real 0] req@ffff8f39a429c200 x1636737669240752/t0(0) o104->fir-MDT0002@10.8.8.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563508867 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:07 fir-md1-s1 kernel: Lustre: 23730:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 18 21:01:08 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f405b6e2200 Jul 18 21:01:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1f4e4f5580/0x5d9ee66524b45f2e lrc: 3/0,0 mode: PR/PR res: [0x200025fcc:0xc58a:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c694bada3 expref: 3002975 pid: 20722 timeout: 2623928 lvb_type: 0 Jul 18 21:01:08 fir-md1-s1 kernel: LustreError: 21268:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508868 with bad export cookie 6746082379716063017 Jul 18 21:01:08 fir-md1-s1 kernel: LustreError: 21268:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 29 previous similar messages Jul 18 21:01:09 fir-md1-s1 kernel: LustreError: 21305:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508869 with bad export cookie 6746082379716063017 Jul 18 21:01:09 fir-md1-s1 kernel: LustreError: 21305:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1414 previous similar messages Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1e22f5e050 x1631541949606416/t0(0) o4->69db3513-ecbf-49e6-41f1-877c7ce0f3a2@10.8.18.34@o2ib6:22/0 lens 504/448 e 1 to 0 dl 1563508882 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 18 21:01:10 fir-md1-s1 kernel: Lustre: 23751:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508863/real 0] req@ffff8f3462276c00 x1636737669267728/t0(0) o106->fir-MDT0002@10.8.17.26@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563508870 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:10 fir-md1-s1 kernel: Lustre: 23751:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 23104:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508870 with bad export cookie 6746082379716063017 Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 20930:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508870 with bad export cookie 6746082379716063017 Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 20930:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 2585 previous similar messages Jul 18 21:01:10 fir-md1-s1 kernel: LustreError: 23104:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 114 previous similar messages Jul 18 21:01:12 fir-md1-s1 kernel: LustreError: 20700:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508872 with bad export cookie 6746082379716063017 Jul 18 21:01:12 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508872 with bad export cookie 6746082379716063017 Jul 18 21:01:12 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 5314 previous similar messages Jul 18 21:01:12 fir-md1-s1 kernel: LustreError: 20700:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 106 previous similar messages Jul 18 21:01:14 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2ec263f050 x1631598233997424/t0(0) o4->691e4f7c-24cc-f758-5354-96c1b01f1439@10.8.7.7@o2ib6:2/0 lens 488/448 e 0 to 0 dl 1563508892 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:14 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 18 21:01:14 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18adcba100 x1638082377941104/t0(0) o101->0eaeb89b-859f-1fc8-d1f0-672563c1d160@10.8.8.24@o2ib6:19/0 lens 376/1600 e 1 to 0 dl 1563508879 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:01:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 18 21:01:15 fir-md1-s1 kernel: Lustre: 97643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1b69529e00 x1638082378020544/t0(0) o101->0eaeb89b-859f-1fc8-d1f0-672563c1d160@10.8.8.24@o2ib6:20/0 lens 376/1600 e 1 to 0 dl 1563508880 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:16 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b4c415800 Jul 18 21:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 69db3513-ecbf-49e6-41f1-877c7ce0f3a2 (at 10.8.18.34@o2ib6), client will retry: rc = -110 Jul 18 21:01:16 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f259636ee00 Jul 18 21:01:16 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a78331000 Jul 18 21:01:16 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508867/real 0] req@ffff8f2ed4ff3300 x1636737669312112/t0(0) o106->fir-MDT0000@10.8.27.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563508876 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:16 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 18 21:01:16 fir-md1-s1 kernel: LustreError: 21764:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508876 with bad export cookie 6746082379716063017 Jul 18 21:01:16 fir-md1-s1 kernel: LustreError: 21764:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 10137 previous similar messages Jul 18 21:01:18 fir-md1-s1 kernel: Lustre: 46530:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c22606850 x1631563111188384/t0(0) o4->f7f29fbd-f06d-1e4f-a662-2d2ae362522d@10.8.8.7@o2ib6:23/0 lens 488/448 e 1 to 0 dl 1563508883 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:18 fir-md1-s1 kernel: Lustre: 46530:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 18 21:01:22 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f268ad0cc50 x1638899755153984/t0(0) o4->d8428b3f-ceef-fb57-6c0a-b3ad15aaf988@10.8.27.7@o2ib6:6/0 lens 488/448 e 0 to 0 dl 1563508896 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:22 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 18 21:01:22 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b518b5400 Jul 18 21:01:23 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2faa2d3c00 Jul 18 21:01:23 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1adedc6600 Jul 18 21:01:23 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a0e637a00 Jul 18 21:01:23 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20a6e65d00 x1631582089543968/t0(0) o101->645c01b6-7440-897a-ad36-a9e0b6138a74@10.8.7.15@o2ib6:28/0 lens 376/1600 e 0 to 0 dl 1563508888 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:23 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 26888:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508884 with bad export cookie 6746082379716063017 Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508884 with bad export cookie 6746082379716063017 Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 19801 previous similar messages Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 26888:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 70 previous similar messages Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16de8e5200 Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d7b3bbc00 Jul 18 21:01:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ec9719ae-e98d-245f-cb43-8c61dda19eb4 (at 10.8.18.29@o2ib6), client will retry: rc = -110 Jul 18 21:01:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a7d90fc00 Jul 18 21:01:24 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16319f4400 Jul 18 21:01:25 fir-md1-s1 kernel: Lustre: 23575:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508878/real 0] req@ffff8f09086b8600 x1636737669424800/t0(0) o104->fir-MDT0002@10.8.8.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563508885 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:25 fir-md1-s1 kernel: Lustre: 23575:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 18 21:01:25 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a0e634600 Jul 18 21:01:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27df316c00 Jul 18 21:01:25 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f314f323400 Jul 18 21:01:25 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27df313e00 Jul 18 21:01:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19e09e2200 Jul 18 21:01:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 8bcbb71f-dec9-01fd-fa31-3d32f5a62a50 (at 10.8.8.23@o2ib6), client will retry: rc -110 Jul 18 21:01:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 21:01:29 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d7b3ba000 Jul 18 21:01:29 fir-md1-s1 kernel: Lustre: 22058:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f28abc5b050 x1633733401849904/t0(0) o4->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:25/0 lens 488/448 e 1 to 0 dl 1563508885 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:01:31 fir-md1-s1 kernel: Lustre: 97661:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3s); client may timeout. req@ffff8f20a6e65d00 x1631582089543968/t353595191417(0) o101->645c01b6-7440-897a-ad36-a9e0b6138a74@10.8.7.15@o2ib6:28/0 lens 376/968 e 0 to 0 dl 1563508888 ref 1 fl Complete:/0/0 rc 0/0 Jul 18 21:01:31 fir-md1-s1 kernel: Lustre: 46521:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f28abc59050 x1631625044701872/t0(0) o4->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:6/0 lens 488/448 e 0 to 0 dl 1563508896 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:31 fir-md1-s1 kernel: Lustre: 46521:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Jul 18 21:01:32 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:01:32 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 18 previous similar messages Jul 18 21:01:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16ee597c00 Jul 18 21:01:33 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a7d90b800 Jul 18 21:01:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1907a9e200 Jul 18 21:01:33 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f250d0b5200 Jul 18 21:01:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1fd739ee00 Jul 18 21:01:34 fir-md1-s1 kernel: LustreError: 46528:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f2ec263e850 x1631770071603504/t0(0) o4->bf3478cc-569b-5c14-1a71-20ca1e1f08aa@10.8.12.12@o2ib6:4/0 lens 504/448 e 1 to 0 dl 1563508894 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 38s: evicting client at 10.8.8.24@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e84993a80/0x5d9ee665585955b5 lrc: 4/0,0 mode: PR/PR res: [0x2c002c37a:0x16bc6:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.8.24@o2ib6 remote: 0x492f4229dc85f5fd expref: 391 pid: 21481 timeout: 2623957 lvb_type: 0 Jul 18 21:01:38 fir-md1-s1 kernel: LustreError: 25633:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1a9fa69450 x1636571989240880/t0(0) o4->0d321477-e1a4-6634-93cf-b59d753ff98f@10.8.18.6@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1563508898 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:39 fir-md1-s1 kernel: LustreError: 22058:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2e76603c50 x1631567495152048/t0(0) o4->024f7538-830d-bf6f-afb4-1c31cea1bee4@10.8.8.5@o2ib6:0/0 lens 488/448 e 0 to 0 dl 1563508920 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:39 fir-md1-s1 kernel: LustreError: 22058:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 18 previous similar messages Jul 18 21:01:39 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d32213400 Jul 18 21:01:39 fir-md1-s1 kernel: Lustre: 25633:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f1a9fa69450 x1636571989240880/t0(0) o4->0d321477-e1a4-6634-93cf-b59d753ff98f@10.8.18.6@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1563508898 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:01:39 fir-md1-s1 kernel: Lustre: 25633:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 18 21:01:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.29@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e6f035e80/0x5d9ee6652ca37d94 lrc: 4/0,0 mode: PR/PR res: [0x2c002c1dd:0x5011:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.29@o2ib6 remote: 0xde59b70fec344fff expref: 452 pid: 23746 timeout: 2623959 lvb_type: 0 Jul 18 21:01:40 fir-md1-s1 kernel: LustreError: 46810:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508900 with bad export cookie 6746082379716063017 Jul 18 21:01:40 fir-md1-s1 kernel: LustreError: 46810:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 35293 previous similar messages Jul 18 21:01:41 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508894/real 0] req@ffff8f246cac1800 x1636737669609568/t0(0) o104->fir-MDT0002@10.8.15.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563508901 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:01:41 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Jul 18 21:01:43 fir-md1-s1 kernel: LustreError: 21460:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f188cf7d000 ns: mdt-fir-MDT0002_UUID lock: ffff8f23b6859d40/0x5d9ee6655abe946e lrc: 1/0,0 mode: EX/EX res: [0x2c002c37a:0x16bc6:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.8.24@o2ib6 remote: 0x492f4229dc85f968 expref: 7 pid: 21460 timeout: 0 lvb_type: 3 Jul 18 21:01:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c35d8a79-a513-f035-a38a-80dae6993f70 (at 10.8.20.17@o2ib6) Jul 18 21:01:46 fir-md1-s1 kernel: Lustre: Skipped 1441 previous similar messages Jul 18 21:01:47 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20552c2c00 Jul 18 21:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ad5b8b9d-f149-444a-fb05-2479a0cbbcd5 (at 10.8.15.10@o2ib6), client will retry: rc = -110 Jul 18 21:01:47 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 18 21:01:47 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:8s); client may timeout. req@ffff8f250754a050 x1639149973013328/t0(0) o4->ad5b8b9d-f149-444a-fb05-2479a0cbbcd5@10.8.15.10@o2ib6:9/0 lens 488/448 e 0 to 0 dl 1563508899 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:01:47 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 18 21:01:49 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f25219b7000 ns: mdt-fir-MDT0002_UUID lock: ffff8f1b44375340/0x5d9ee6655ba9bb56 lrc: 1/0,0 mode: EX/EX res: [0x2c002c0b0:0x4915:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.7.7@o2ib6 remote: 0xc3f349d58c3ba3fa expref: 5 pid: 24580 timeout: 0 lvb_type: 3 Jul 18 21:01:49 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f228a55c450 x1638883280086560/t0(0) o4->bc11f22c-be02-95cf-a44c-67712cc4b020@10.8.12.7@o2ib6:24/0 lens 504/448 e 0 to 0 dl 1563508914 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:49 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Jul 18 21:01:54 fir-md1-s1 kernel: LustreError: 21448:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f250754c450 x1631556669031616/t0(0) o4->02ff72b7-013c-f5b2-3098-e3501810341b@10.8.8.6@o2ib6:24/0 lens 488/448 e 0 to 0 dl 1563508914 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:01:54 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f200b34aa00 Jul 18 21:01:55 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c1bb3ea00 Jul 18 21:01:55 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:14s); client may timeout. req@ffff8f1c6efed050 x1631615409348336/t0(0) o4->195b2aff-dd7c-f763-3c56-263374ade64c@10.8.7.12@o2ib6:11/0 lens 488/448 e 0 to 0 dl 1563508901 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:01:55 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1947f54400 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 21379:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2c43aa7400 ns: mdt-fir-MDT0002_UUID lock: ffff8f349a87f500/0x5d9ee6655bcf20ee lrc: 1/0,0 mode: EX/EX res: [0x2c002c4ae:0x1707:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.17.10@o2ib6 remote: 0xdc24cf5a10726da2 expref: 6 pid: 21379 timeout: 0 lvb_type: 3 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2767e87a00 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33c96b1c00 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2053f5ae00 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.17.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3e4c212640/0x5d9ee665597c19bd lrc: 4/0,0 mode: CR/CR res: [0x2c002beba:0x4a5f:0x0].0x0 bits 0x9/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.17.14@o2ib6 remote: 0x88967e2a1657fd58 expref: 401 pid: 50442 timeout: 2623976 lvb_type: 0 Jul 18 21:01:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 18 21:01:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2233db1600 Jul 18 21:01:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1947f5d800 Jul 18 21:01:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32a660d600 Jul 18 21:01:59 fir-md1-s1 kernel: LustreError: 97665:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f188cf7d000 ns: mdt-fir-MDT0002_UUID lock: ffff8f18d6b3d7c0/0x5d9ee6655ad8ab47 lrc: 1/0,0 mode: EX/EX res: [0x2c002c37a:0x16bbd:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.8.24@o2ib6 remote: 0x492f4229dc85f96f expref: 3 pid: 97665 timeout: 0 lvb_type: 3 Jul 18 21:01:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ed8e89c00 Jul 18 21:02:02 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ea6d1ba00 Jul 18 21:02:05 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:02:05 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 21 previous similar messages Jul 18 21:02:05 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33c96b6400 Jul 18 21:02:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a81f0e000 Jul 18 21:02:09 fir-md1-s1 kernel: LustreError: 22280:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563508839, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1cd0d0cc80/0x5d9ee66558cf43b2 lrc: 3/0,1 mode: --/PW res: [0x200025fcc:0xc58a:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22280 timeout: 0 lvb_type: 0 Jul 18 21:02:12 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d04bd0e00 Jul 18 21:02:12 fir-md1-s1 kernel: Lustre: 46560:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:37s); client may timeout. req@ffff8f228a55ec50 x1636424947053072/t0(0) o4->62c3a024-34de-fd61-6956-bb3675e9d145@10.8.1.13@o2ib6:5/0 lens 488/448 e 1 to 0 dl 1563508895 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:02:12 fir-md1-s1 kernel: Lustre: 46560:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 15 previous similar messages Jul 18 21:02:12 fir-md1-s1 kernel: LustreError: 20368:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508932 with bad export cookie 6746082379716063017 Jul 18 21:02:12 fir-md1-s1 kernel: LustreError: 20368:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 62633 previous similar messages Jul 18 21:02:13 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f228a55b050 x1631573162962384/t0(0) o4->408be9ad-1c10-b6aa-e3da-e3970b5ae7cb@10.8.8.4@o2ib6:13/0 lens 488/448 e 0 to 0 dl 1563508933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:02:13 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 18 21:02:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17996a9c00 Jul 18 21:02:14 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f28abc5f850 x1635345741050208/t0(0) o4->b9fbdf1a-1933-2972-672c-32134f9ae4cb@10.8.1.19@o2ib6:25/0 lens 488/448 e 0 to 0 dl 1563508945 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:02:14 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 18 21:02:17 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dddfc0200 Jul 18 21:02:17 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ed30dd800 Jul 18 21:02:25 fir-md1-s1 kernel: Lustre: 46531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1dd97d9450 x1636423556090208/t0(0) o4->95ec1043-1dcc-efe7-2a3e-ad37fdc1e09c@10.8.1.9@o2ib6:0/0 lens 488/448 e 0 to 0 dl 1563508950 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:02:25 fir-md1-s1 kernel: Lustre: 46531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Jul 18 21:02:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2acea81200 Jul 18 21:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with bc83c7c5-08aa-b1e5-1dd5-b1a51ba5cb4a (at 10.8.1.15@o2ib6), client will retry: rc = -110 Jul 18 21:02:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 18 21:02:30 fir-md1-s1 kernel: LustreError: 46532:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1dd97d9450 x1636423556090208/t0(0) o4->95ec1043-1dcc-efe7-2a3e-ad37fdc1e09c@10.8.1.9@o2ib6:0/0 lens 488/448 e 0 to 0 dl 1563508950 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:02:35 fir-md1-s1 kernel: Lustre: 97643:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563508929/real 0] req@ffff8f1dfe7f3000 x1636737669950496/t0(0) o106->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563508955 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:02:35 fir-md1-s1 kernel: Lustre: 97643:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 18 21:02:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20e3c33000 Jul 18 21:02:39 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ea62a8200 Jul 18 21:02:39 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f23eb156400 Jul 18 21:02:47 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2508ea8c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2901462880/0x5d9ee6655d296950 lrc: 1/0,0 mode: EX/EX res: [0x2c002beba:0x4a5f:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.17.14@o2ib6 remote: 0x88967e2a1657ffdc expref: 3 pid: 23704 timeout: 0 lvb_type: 3 Jul 18 21:02:47 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:50s); client may timeout. req@ffff8f2f4cdbbc00 x1634527010928032/t353595876430(0) o101->6249cf6f-bb2a-fddf-054f-d075abe74eeb@10.8.17.14@o2ib6:27/0 lens 376/1568 e 0 to 0 dl 1563508917 ref 1 fl Complete:/0/0 rc -107/-107 Jul 18 21:02:47 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Jul 18 21:02:53 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16319c2000 Jul 18 21:03:00 fir-md1-s1 kernel: LNetError: 48115:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.15.4@o2ib6 from 10.0.10.51@o2ib7 Jul 18 21:03:06 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2acb3ede00 Jul 18 21:03:16 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.15.4@o2ib6 arrived at 1563508996 with bad export cookie 6746082379716063017 Jul 18 21:03:16 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 155345 previous similar messages Jul 18 21:03:17 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f228a558c50 x1634530228572768/t0(0) o4->98b812d8-7bc3-6324-d90c-0dc15df187f0@10.8.17.15@o2ib6:17/0 lens 488/448 e 0 to 0 dl 1563508997 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:03:17 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 18 21:03:19 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2e76602050 x1635347293376080/t0(0) o3->c1c54f8a-db68-72ea-1f4f-3dc905e7ab7d@10.8.1.16@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1563509028 ref 1 fl Interpret:/0/0 rc 0/0 Jul 18 21:03:19 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 18 21:03:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:03:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Jul 18 21:03:22 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f236b691a00 Jul 18 21:03:29 fir-md1-s1 kernel: Lustre: 21536:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2e76600850 x1631313422977232/t0(0) o4->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:4/0 lens 488/448 e 0 to 0 dl 1563509014 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:03:29 fir-md1-s1 kernel: Lustre: 21536:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Jul 18 21:03:34 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e14ca7000 Jul 18 21:03:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ca6cc8c9-eda8-7df6-f50b-18cb02b30acf (at 10.8.1.11@o2ib6), client will retry: rc = -110 Jul 18 21:03:34 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 18 21:03:49 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eb6767a00 Jul 18 21:03:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f172a0a3000 Jul 18 21:03:59 fir-md1-s1 kernel: Lustre: 46532:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:89s); client may timeout. req@ffff8f1dd97d9450 x1636423556090208/t0(0) o4->95ec1043-1dcc-efe7-2a3e-ad37fdc1e09c@10.8.1.9@o2ib6:0/0 lens 488/448 e 0 to 0 dl 1563508950 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 18 21:03:59 fir-md1-s1 kernel: Lustre: 46532:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 18 21:04:00 fir-md1-s1 kernel: LNet: Service thread pid 22280 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:04:00 fir-md1-s1 kernel: Pid: 22280, comm: mdt01_042 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:04:00 fir-md1-s1 kernel: Call Trace: Jul 18 21:04:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:04:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:04:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:04:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:04:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:04:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:04:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:04:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:04:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509040.22280 Jul 18 21:04:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 21:04:37 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 18 21:04:41 fir-md1-s1 kernel: LNet: Service thread pid 50447 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:04:41 fir-md1-s1 kernel: Pid: 50447, comm: mdt01_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:04:41 fir-md1-s1 kernel: Call Trace: Jul 18 21:04:41 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 18 21:04:41 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 18 21:04:41 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 18 21:04:41 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 18 21:04:41 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:04:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:04:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:04:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:04:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509081.50447 Jul 18 21:04:42 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24fbc26000 Jul 18 21:04:42 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19e67d4600 Jul 18 21:04:53 fir-md1-s1 kernel: LNet: Service thread pid 97661 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:04:53 fir-md1-s1 kernel: Pid: 97661, comm: mdt01_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:04:53 fir-md1-s1 kernel: Call Trace: Jul 18 21:04:53 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 18 21:04:53 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 18 21:04:53 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 18 21:04:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 18 21:04:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:04:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:04:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:04:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:04:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509093.97661 Jul 18 21:04:55 fir-md1-s1 kernel: LNet: Service thread pid 50447 completed after 214.89s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:04:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15b5e05c00 Jul 18 21:05:03 fir-md1-s1 kernel: Lustre: 23710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563509096/real 0] req@ffff8f2abf94e300 x1636737670941008/t0(0) o104->fir-MDT0002@10.8.8.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563509103 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 18 21:05:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2047d8f400 Jul 18 21:05:23 fir-md1-s1 kernel: LNet: Service thread pid 97661 completed after 230.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:05:25 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2b8428b600 Jul 18 21:05:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ea6318d80/0x5d9ee66524c7b26e lrc: 4/0,0 mode: PR/PR res: [0x2c00130be:0x1b17:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.8.21@o2ib6 remote: 0xe913e4ad0445b3c5 expref: 386 pid: 21679 timeout: 2624185 lvb_type: 0 Jul 18 21:05:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 12 previous similar messages Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ef399ac00 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f326d3ece00 Jul 18 21:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 95ec1043-1dcc-efe7-2a3e-ad37fdc1e09c (at 10.8.1.9@o2ib6), client will retry: rc -110 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f326d3e8e00 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f326d3e8600 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3145336c00 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3145335200 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f3245c200 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ef399d400 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f55b78800 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f2b14fc00 Jul 18 21:05:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f2b14b800 Jul 18 21:05:30 fir-md1-s1 kernel: LNet: Service thread pid 97643 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:05:30 fir-md1-s1 kernel: Pid: 97643, comm: mdt01_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:05:30 fir-md1-s1 kernel: Call Trace: Jul 18 21:05:30 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 18 21:05:30 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 18 21:05:30 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 18 21:05:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 18 21:05:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:05:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:05:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:05:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:05:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509130.97643 Jul 18 21:05:34 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 18 21:05:34 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 21 previous similar messages Jul 18 21:05:35 fir-md1-s1 kernel: LNet: Service thread pid 97643 completed after 205.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:05:35 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.21@o2ib6 arrived at 1563509135 with bad export cookie 6746082289100638810 Jul 18 21:05:35 fir-md1-s1 kernel: LustreError: 23101:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 46875 previous similar messages Jul 18 21:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0eaeb89b-859f-1fc8-d1f0-672563c1d160 (at 10.8.8.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c43155800, cur 1563509141 expire 1563508991 last 1563508914 Jul 18 21:05:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 21:05:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.2.34@o2ib6, removing former export from same NID Jul 18 21:05:51 fir-md1-s1 kernel: Lustre: Skipped 3480 previous similar messages Jul 18 21:06:12 fir-md1-s1 kernel: LustreError: 23627:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f34f6e0e300 x1636737671899280/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:06:12 fir-md1-s1 kernel: LustreError: 23627:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 18 21:06:16 fir-md1-s1 kernel: LustreError: 21672:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f34f6e0b000 x1636737671971664/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:06:20 fir-md1-s1 kernel: LustreError: 24582:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f243aa53c00 x1636737672047408/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:06:31 fir-md1-s1 kernel: LustreError: 23757:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2abf94e300 x1636737672217008/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:06:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f223aa46300/0x5d9ee6652414f447 lrc: 3/0,0 mode: PR/PR res: [0x2000260f9:0xbdb2:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c693a9f32 expref: 1643875 pid: 20545 timeout: 2624261 lvb_type: 0 Jul 18 21:06:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 827c28fa-8def-51be-28bf-16efe28ca63d (at 10.8.17.15@o2ib6) in 212 seconds. I think it's dead, and I am evicting it. exp ffff8f34e9a1e800, cur 1563509210 expire 1563509060 last 1563508998 Jul 18 21:06:50 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 18 21:06:50 fir-md1-s1 kernel: LustreError: 23684:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2abf948f00 x1636737672476768/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:06:50 fir-md1-s1 kernel: LustreError: 23684:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 18 21:07:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2758eeec00/0x5d9ee66525213978 lrc: 3/0,0 mode: PR/PR res: [0x200025fcc:0x1316d:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c694d9622 expref: 1605088 pid: 23757 timeout: 2624280 lvb_type: 0 Jul 18 21:07:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 18 21:07:35 fir-md1-s1 kernel: LustreError: 23645:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ec4144e00 x1636737673028288/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:07:42 fir-md1-s1 kernel: LustreError: 23627:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563509172, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f34b829af40/0x5d9ee665760202ae lrc: 3/0,1 mode: --/PW res: [0x2000260f9:0xbdb2:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23627 timeout: 0 lvb_type: 0 Jul 18 21:08:00 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2df27ebf00 x1638102643541952/t0(0) o36->95c23571-6ded-28b5-8b2e-63d85e709c23@10.8.15.4@o2ib6:5/0 lens 488/3152 e 0 to 0 dl 1563509285 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:08:00 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 18 21:08:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ea932ec00/0x5d9ee65f4be33141 lrc: 3/0,0 mode: PR/PR res: [0x200029d2a:0x35b2:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c1a8fee69 expref: 1482108 pid: 21415 timeout: 2624344 lvb_type: 0 Jul 18 21:08:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 18 21:08:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6) reconnecting Jul 18 21:08:56 fir-md1-s1 kernel: Lustre: Skipped 7458 previous similar messages Jul 18 21:08:59 fir-md1-s1 kernel: LustreError: 23758:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f3462271e00 x1636737673535392/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:08:59 fir-md1-s1 kernel: LustreError: 23758:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 18 21:09:05 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563509255, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3457ac5a00/0x5d9ee6657cfbec1d lrc: 3/0,1 mode: --/PW res: [0x200029d2a:0x35b2:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23645 timeout: 0 lvb_type: 0 Jul 18 21:09:05 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Jul 18 21:09:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1ec115d340/0x5d9ee66524e0f9ee lrc: 3/0,0 mode: PR/PR res: [0x2000260f9:0xeca8:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c694c77f4 expref: 1335137 pid: 22288 timeout: 2624428 lvb_type: 0 Jul 18 21:09:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 18 21:09:32 fir-md1-s1 kernel: LNet: Service thread pid 23627 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:09:32 fir-md1-s1 kernel: Pid: 23627, comm: mdt02_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:09:32 fir-md1-s1 kernel: Call Trace: Jul 18 21:09:32 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:09:32 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:09:32 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:09:32 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:09:32 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:09:32 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:09:32 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:09:32 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:09:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509372.23627 Jul 18 21:09:51 fir-md1-s1 kernel: Pid: 23757, comm: mdt02_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:09:51 fir-md1-s1 kernel: Call Trace: Jul 18 21:09:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:09:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:09:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:09:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:09:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:09:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:09:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:09:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:09:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509391.23757 Jul 18 21:09:58 fir-md1-s1 kernel: Pid: 23744, comm: mdt02_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:09:58 fir-md1-s1 kernel: Call Trace: Jul 18 21:09:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:09:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:09:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:09:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:09:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:09:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:09:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:09:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:09:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509398.23744 Jul 18 21:10:10 fir-md1-s1 kernel: LNet: Service thread pid 23684 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:10:10 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Jul 18 21:10:10 fir-md1-s1 kernel: Pid: 23684, comm: mdt02_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:10:10 fir-md1-s1 kernel: Call Trace: Jul 18 21:10:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:10:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:10:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:10:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:10:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:10:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:10:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:10:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:10:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509411.23684 Jul 18 21:10:55 fir-md1-s1 kernel: Pid: 23645, comm: mdt02_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:10:55 fir-md1-s1 kernel: Call Trace: Jul 18 21:10:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:10:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:10:55 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:10:56 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:10:56 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:10:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:10:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:10:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:10:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:10:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:10:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:10:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509456.23645 Jul 18 21:11:37 fir-md1-s1 kernel: LNet: Service thread pid 20511 was inactive for 200.70s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 18 21:11:37 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Jul 18 21:11:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509497.20511 Jul 18 21:11:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 21:11:55 fir-md1-s1 kernel: Lustre: Skipped 10275 previous similar messages Jul 18 21:12:19 fir-md1-s1 kernel: LNet: Service thread pid 23758 was inactive for 200.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 18 21:12:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509539.23758 Jul 18 21:14:37 fir-md1-s1 kernel: LNet: Service thread pid 22280 completed after 837.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:14:56 fir-md1-s1 kernel: LustreError: 23597:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f18d496cb00 x1636737677776848/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 18 21:14:57 fir-md1-s1 kernel: Lustre: 20719:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-380), not sending early reply req@ffff8f19571c5400 x1631606285364736/t426054392411(0) o36->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:2/0 lens 488/3152 e 0 to 0 dl 1563509702 ref 2 fl Interpret:/0/0 rc 0/0 Jul 18 21:15:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f268ec5d100/0x5d9ee66523e76a30 lrc: 3/0,0 mode: PR/PR res: [0x2000260f9:0x268b:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0x285769c693957d6 expref: 821574 pid: 23692 timeout: 2624785 lvb_type: 0 Jul 18 21:15:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 21:15:54 fir-md1-s1 kernel: Lustre: Skipped 695 previous similar messages Jul 18 21:16:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 21:16:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 21:16:26 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563509696, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f275196dc40/0x5d9ee665a78279bf lrc: 3/0,1 mode: --/PW res: [0x2000260f9:0x268b:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23597 timeout: 0 lvb_type: 0 Jul 18 21:16:26 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jul 18 21:17:04 fir-md1-s1 kernel: LNet: Service thread pid 23757 completed after 633.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:18:17 fir-md1-s1 kernel: LNet: Service thread pid 23597 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:18:17 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 18 21:18:17 fir-md1-s1 kernel: Pid: 23597, comm: mdt02_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:18:17 fir-md1-s1 kernel: Call Trace: Jul 18 21:18:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:18:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 18 21:18:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:18:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:18:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:18:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:18:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:18:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:18:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563509897.23597 Jul 18 21:18:18 fir-md1-s1 kernel: LNet: Service thread pid 20511 completed after 600.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:19:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6) reconnecting Jul 18 21:19:12 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 21:20:23 fir-md1-s1 kernel: LNet: Service thread pid 23684 completed after 812.64s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:20:23 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 18 21:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 21:21:55 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 18 21:22:03 fir-md1-s1 kernel: LNet: Service thread pid 21446 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 18 21:22:03 fir-md1-s1 kernel: Pid: 21446, comm: mdt01_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 18 21:22:03 fir-md1-s1 kernel: Call Trace: Jul 18 21:22:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 18 21:22:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 18 21:22:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 18 21:22:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 18 21:22:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 18 21:22:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563510123.21446 Jul 18 21:23:04 fir-md1-s1 kernel: LNet: Service thread pid 23627 completed after 1012.71s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:23:04 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 18 21:26:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 21:26:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 18 21:26:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 21:26:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 18 21:27:56 fir-md1-s1 kernel: LNet: Service thread pid 23758 completed after 1137.55s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 18 21:27:56 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Jul 18 21:29:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 21:29:36 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 18 21:29:59 fir-md1-s1 kernel: LustreError: 31003:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.109.20@o2ib4 arrived at 1563510599 with bad export cookie 6746082610267664549 Jul 18 21:31:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd13f11d-b08a-803e-892a-5da5d2c02b5d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16210a3000, cur 1563510688 expire 1563510538 last 1563510461 Jul 18 21:31:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 21:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 21:32:19 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 21:32:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e34dace-5c2a-fdef-c955-ab11aa1428b7 (at 10.9.109.20@o2ib4) in 165 seconds. I think it's dead, and I am evicting it. exp ffff8f0653f73c00, cur 1563510764 expire 1563510614 last 1563510599 Jul 18 21:32:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 21:37:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 21:37:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 18 21:37:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 21:37:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 21:39:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 21:39:51 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 21:42:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 21:42:45 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 21:48:00 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 21:48:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 21:48:52 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 21:50:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 21:50:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 21:52:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 21:52:45 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 21:59:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 21:59:17 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 18 22:00:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 22:00:55 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 18 22:02:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 18 22:02:47 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 18 22:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 95b49d98-54c6-9e12-2fd2-818a97bece53 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f173afec800, cur 1563512578 expire 1563512428 last 1563512351 Jul 18 22:03:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 22:03:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 22:08:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 22:08:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 22:09:10 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:09:10 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 18 22:09:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 22:09:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 22:11:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 22:11:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 18 22:13:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 22:13:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 18 22:15:50 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:20:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 22:20:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 18 22:21:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 22:21:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 22:23:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 22:23:08 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 18 22:27:31 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:30:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 22:30:49 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 18 22:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c24819b3-f636-5874-7092-d522c8100e80 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30d62bc000, cur 1563514291 expire 1563514141 last 1563514064 Jul 18 22:31:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 22:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 22:31:40 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 22:33:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 22:33:08 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 18 22:36:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 22:37:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8ced765c-0efd-d10a-93ee-05a3d5e43404 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18202f0800, cur 1563514640 expire 1563514490 last 1563514413 Jul 18 22:37:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 22:39:10 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:39:44 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:41:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 22:41:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 22:41:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 22:41:41 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 22:43:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 22:43:16 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 18 22:44:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 22:50:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 22:51:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 22:51:54 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 18 22:52:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 22:52:53 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 18 22:53:13 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 22:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 18 22:54:02 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 18 23:02:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 18 23:02:30 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 18 23:03:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:03:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 18 23:03:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 23:03:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 18 23:04:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 23:04:02 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 18 23:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87307b55-7fb1-1d48-dcde-460e575d93d8 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e3be6a400, cur 1563516516 expire 1563516366 last 1563516289 Jul 18 23:08:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 18 23:13:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 23:13:06 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 18 23:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 23:14:08 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 23:14:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 23:14:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 18 23:23:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 18 23:23:10 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 18 23:24:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 23:24:26 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 18 23:25:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 23:25:59 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 18 23:27:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:27:08 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 23:31:48 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 18 23:33:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 18 23:33:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 18 23:34:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 18 23:34:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 18 23:37:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28232cd800, cur 1563518257 expire 1563518107 last 1563518030 Jul 18 23:37:37 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 18 23:38:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 23:38:24 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 18 23:38:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a4722b8b-7996-7aae-371b-e39d95366788 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b00dde000, cur 1563518306 expire 1563518156 last 1563518079 Jul 18 23:38:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 23:39:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b4bd5853-d5cf-316d-4703-3ee7227da830 (at 10.8.23.14@o2ib6) in 188 seconds. I think it's dead, and I am evicting it. exp ffff8f25219a9c00, cur 1563518382 expire 1563518232 last 1563518194 Jul 18 23:39:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 23:40:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5b6dbe09-e800-1f10-8620-b58fefc3f442 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ddeab0800, cur 1563518421 expire 1563518271 last 1563518194 Jul 18 23:40:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 18 23:43:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 18 23:43:38 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 18 23:44:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 18 23:44:41 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 18 23:44:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:49:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:49:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 18 23:49:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 18 23:51:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:52:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:53:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 18 23:53:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 18 23:54:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 18 23:54:43 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 18 23:55:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 18 23:57:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:02:57 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 00:03:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 00:03:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 19 00:03:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 00:03:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 00:04:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 00:04:49 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 19 00:11:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8e06ac75-ce37-b0a2-7045-829521e9a97d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e56abc00, cur 1563520271 expire 1563520121 last 1563520044 Jul 19 00:11:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:11:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 00:13:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 00:13:11 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 19 00:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 00:14:06 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 19 00:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 00:15:16 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 19 00:19:55 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 00:20:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:20:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 00:23:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 00:23:49 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 19 00:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 00:24:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 00:25:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 00:25:17 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 19 00:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c92f1dc00, cur 1563521127 expire 1563520977 last 1563520900 Jul 19 00:25:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 00:28:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7595877a-f4ff-657e-5307-b479d65f9536 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f2249cc00, cur 1563521320 expire 1563521170 last 1563521093 Jul 19 00:34:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 00:34:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:34:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 00:34:15 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 00:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 00:35:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 00:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 00:35:19 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 19 00:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7776f16c-cc8d-e400-6940-899eab29906f (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f282e410000, cur 1563522177 expire 1563522027 last 1563521950 Jul 19 00:42:57 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 19 00:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c0036d24-18dd-a3ae-78f8-34a2d6647dd0 (at 10.8.23.14@o2ib6) in 190 seconds. I think it's dead, and I am evicting it. exp ffff8f240e353000, cur 1563522253 expire 1563522103 last 1563522063 Jul 19 00:44:13 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 19 00:44:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 00:44:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 00:44:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d15474ff-8e0a-fd7d-1a76-a3f0f2e56b48 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fdd86e000, cur 1563522290 expire 1563522140 last 1563522063 Jul 19 00:44:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 19 00:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 00:45:27 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 19 00:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 00:45:27 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 00:54:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 00:54:50 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 19 00:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 00:55:58 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 00:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 00:55:58 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 19 00:58:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 00:58:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 01:03:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:04:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 01:04:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 01:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 01:06:28 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 01:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 01:06:28 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 01:11:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7c5cb8ba-1bf6-8823-0b4e-75b530ae1fdc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d813d7800, cur 1563524133 expire 1563523983 last 1563523906 Jul 19 01:15:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 01:15:57 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 01:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 01:16:56 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 01:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 01:16:56 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 19 01:26:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 01:26:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 19 01:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 01:27:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 01:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 01:27:02 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 01:29:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:30:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:32:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:32:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 01:37:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 01:37:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 01:37:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 01:37:13 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 01:37:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 01:37:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 01:43:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 22029d79-cfc3-d458-b1d8-ddf84692fa5f (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d831eb000, cur 1563525804 expire 1563525654 last 1563525577 Jul 19 01:43:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 01:44:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 01:44:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 01:47:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 01:47:13 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 19 01:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 01:47:19 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 01:49:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 01:49:35 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 01:57:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 01:57:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 01:57:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 01:57:25 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 02:00:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 02:00:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 02:07:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 02:07:26 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 02:07:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 02:07:26 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 19 02:08:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:08:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 02:12:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 02:12:53 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 02:14:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:16:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:16:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 02:17:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 02:17:32 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 19 02:17:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 02:17:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 02:22:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 02:22:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 02:23:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:27:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 02:27:33 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 19 02:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 02:27:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 02:30:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:30:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 02:33:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 02:33:10 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 02:37:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 02:37:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 19 02:37:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 02:37:51 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 19 02:40:36 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 02:43:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 02:43:45 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 19 02:47:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 02:47:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 02:47:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 02:47:53 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 02:48:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 02:48:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 19 02:54:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 02:54:08 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 02:58:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 02:58:01 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 19 02:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 02:58:57 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 03:04:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:04:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 03:08:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 03:08:04 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 03:09:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 03:09:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 03:11:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:12:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:14:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:14:13 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 19 03:18:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 03:18:19 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 19 03:19:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 03:19:36 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 03:21:40 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 03:24:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:24:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 03:28:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 03:28:19 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 03:30:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 03:30:06 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 03:34:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:34:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:34:19 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 19 03:38:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 03:38:28 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 19 03:40:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 03:40:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:40:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 03:45:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:45:30 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 03:46:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:47:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:47:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 03:48:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 03:48:56 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 03:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 03:50:13 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 03:52:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28a16f4c00, cur 1563533520 expire 1563533370 last 1563533293 Jul 19 03:52:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 03:55:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 03:55:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 03:59:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 03:59:19 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 03:59:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 03:59:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 04:00:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 04:00:25 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 04:04:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f1a650400, cur 1563534261 expire 1563534111 last 1563534034 Jul 19 04:05:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 04:05:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 04:06:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 04:06:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 04:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 04:09:44 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 19 04:10:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 04:10:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 04:12:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1df06d6400, cur 1563534772 expire 1563534622 last 1563534545 Jul 19 04:15:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 04:15:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 04:17:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 04:17:11 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 19 04:20:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 04:20:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 04:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 19 04:21:41 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 19 04:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 04:27:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 19 04:30:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 04:30:08 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 04:30:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 04:30:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 04:31:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 19 04:31:45 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 04:38:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 04:38:23 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 04:40:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 04:40:32 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 04:42:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 04:42:00 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 04:44:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 04:44:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 04:48:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 04:48:25 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 04:50:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 04:50:42 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 19 04:52:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 04:52:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 04:58:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 04:58:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 05:01:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 05:01:03 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 19 05:02:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 05:02:53 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 05:03:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 05:03:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 05:09:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 05:09:11 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 19 05:11:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 05:11:39 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 19 05:13:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 05:13:18 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 05:19:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 05:19:28 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 19 05:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 05:21:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 19 05:23:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 05:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 05:24:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 05:29:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 05:29:29 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 05:31:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 05:31:44 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 19 05:34:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 05:34:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 05:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 05:35:21 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 05:39:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 05:39:32 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 19 05:42:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 05:42:03 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 05:45:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 19 05:45:39 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 05:46:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 05:46:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 05:49:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 05:49:38 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 05:52:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 05:52:28 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 05:55:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 05:55:43 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 05:58:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 05:58:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 05:59:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 05:59:41 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 06:02:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 06:02:34 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 19 06:06:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 06:06:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 06:10:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 06:10:46 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 19 06:13:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 06:13:06 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 19 06:14:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 06:14:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 06:16:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 06:16:29 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 06:23:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 06:23:05 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 06:23:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 06:23:07 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 06:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 06:26:49 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 06:29:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 06:29:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 06:33:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 06:33:06 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 19 06:33:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 06:33:08 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 19 06:36:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 06:36:56 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 06:43:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 06:43:07 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 06:43:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 06:43:15 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 19 06:47:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 06:47:39 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 06:53:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 06:53:54 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 06:54:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 06:55:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 06:55:38 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 19 06:57:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 06:58:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 06:58:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 07:02:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 07:04:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 07:04:00 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 19 07:05:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 07:05:40 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 07:08:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 07:08:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 07:09:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 07:09:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 07:14:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 07:14:09 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 19 07:16:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 07:16:03 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 19 07:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 07:19:35 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 07:24:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 07:24:14 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 07:26:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 07:26:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 07:28:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 07:28:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 07:29:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 07:29:45 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 07:34:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 07:34:52 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 07:39:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 07:39:09 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 07:39:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 07:39:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 07:39:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 07:39:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 07:45:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 07:45:24 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 19 07:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 07:50:25 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 07:50:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 07:50:37 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 19 07:55:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 07:55:26 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 19 07:56:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 07:56:21 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 08:00:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 08:00:25 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 08:01:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c0901a400, cur 1563548487 expire 1563548337 last 1563548260 Jul 19 08:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 08:01:33 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 19 08:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 08:05:58 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 19 08:10:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 08:10:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 08:11:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 08:11:55 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 08:16:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 08:16:40 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 08:17:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28b6cb2c00, cur 1563549427 expire 1563549277 last 1563549200 Jul 19 08:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 08:20:38 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 08:22:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 08:22:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 08:27:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 08:27:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 08:30:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 08:30:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 08:31:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 08:31:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 08:32:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 08:32:11 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 08:34:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 08:34:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 08:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 08:37:54 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 19 08:41:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 08:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 08:41:27 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 08:42:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 08:42:16 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 08:46:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 08:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 08:48:32 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 19 08:52:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 08:52:43 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 08:54:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 08:54:32 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 19 08:58:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 08:58:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 19 08:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 08:59:58 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 19 09:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 09:03:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 09:04:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 09:04:51 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 09:08:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 09:08:46 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 19 09:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 09:13:13 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 09:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f207b142800, cur 1563552802 expire 1563552652 last 1563552575 Jul 19 09:14:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 09:14:51 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 09:16:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 09:16:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 09:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 09:19:00 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 09:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 09:23:24 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 09:27:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 09:27:11 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 09:27:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 09:27:49 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 19 09:29:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 09:29:08 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 19 09:33:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 09:33:25 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 19 09:39:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 09:39:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 09:39:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 09:39:21 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 19 09:40:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 09:40:06 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 19 09:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 09:44:31 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 19 09:46:30 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 09:49:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 09:49:44 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 19 09:51:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 09:51:34 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 09:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 09:54:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 19 09:56:38 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 09:58:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 10:00:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 10:00:15 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 10:01:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 10:01:40 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 10:05:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 10:05:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 10:10:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 10:10:23 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 10:11:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 10:11:44 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 10:13:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 10:13:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 10:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 10:15:23 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 10:21:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 10:21:43 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 10:21:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 10:21:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 10:23:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 10:23:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 10:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 10:25:34 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 19 10:27:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e45ecf800, cur 1563557257 expire 1563557107 last 1563557030 Jul 19 10:32:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 10:32:03 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 10:32:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 10:32:03 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 10:34:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 10:34:41 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 10:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 10:36:12 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 19 10:42:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 10:42:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 10:42:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 10:42:08 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 10:44:06 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 10:45:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 10:45:49 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 10:47:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 10:47:29 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 10:52:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 10:52:15 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 10:52:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 10:52:15 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 19 10:58:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 10:58:00 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 11:02:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 11:02:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 19 11:02:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 11:02:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 11:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 11:08:50 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 11:09:46 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 11:10:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:10:11 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 19 11:12:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 11:12:49 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 11:12:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 11:12:49 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 19 11:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:16:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:16:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 11:18:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 11:18:56 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 19 11:22:36 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 11:22:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 11:22:59 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 11:22:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 11:22:59 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 11:28:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:29:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 11:29:12 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 11:31:53 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 11:33:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 11:33:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 11:33:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 11:33:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 11:34:02 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563561235/real 1563561235] req@ffff8f14c35a7500 x1636738116958112/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1563561242 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 19 11:34:02 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 19 11:39:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 648e4fba-c6a2-795e-119b-f6cc51efcbae (at 10.9.0.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44e1ac7800, cur 1563561553 expire 1563561403 last 1563561326 Jul 19 11:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 11:39:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 11:40:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b74b4b66-65f0-f951-331c-463b7f96e033 (at 10.9.0.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f063bdc7000, cur 1563561629 expire 1563561479 last 1563561402 Jul 19 11:42:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:42:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 11:43:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 11:43:46 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 19 11:43:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 11:43:46 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 19 11:48:10 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563562079/real 1563562079] req@ffff8f2f8c8ce000 x1636738122502208/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1563562090 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 19 11:48:14 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2db0b2d100 x1638277807950688/t0(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:19/0 lens 520/2888 e 1 to 0 dl 1563562099 ref 2 fl Interpret:/0/0 rc 0/0 Jul 19 11:48:14 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 19 11:48:21 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563562090/real 1563562090] req@ffff8f2f8c8ce000 x1636738122502208/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1563562101 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 19 11:48:32 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563562101/real 1563562101] req@ffff8f2f8c8ce000 x1636738122502208/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1563562112 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 19 11:48:32 fir-md1-s1 kernel: LustreError: 23740:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.0.62@o2ib4) failed to reply to blocking AST (req@ffff8f2f8c8ce000 x1636738122502208 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2ed8fd98c0/0x5d9ee677a2b33b6a lrc: 4/0,0 mode: PR/PR res: [0x2c0024163:0x19838:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0xbb8f9eff54b295d7 expref: 2778703 pid: 23622 timeout: 2677190 lvb_type: 0 Jul 19 11:48:32 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.0.62@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Jul 19 11:48:32 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 33s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ed8fd98c0/0x5d9ee677a2b33b6a lrc: 3/0,0 mode: PR/PR res: [0x2c0024163:0x19838:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0xbb8f9eff54b295d7 expref: 2778701 pid: 23622 timeout: 0 lvb_type: 0 Jul 19 11:48:32 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 19 11:48:32 fir-md1-s1 kernel: LustreError: 31001:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.62@o2ib4 arrived at 1563562112 with bad export cookie 6746082393397974099 Jul 19 11:48:36 fir-md1-s1 kernel: LustreError: 25077:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.62@o2ib4 arrived at 1563562116 with bad export cookie 6746082393397974099 Jul 19 11:48:36 fir-md1-s1 kernel: LustreError: 25077:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 160 previous similar messages Jul 19 11:48:44 fir-md1-s1 kernel: LustreError: 31014:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.62@o2ib4 arrived at 1563562124 with bad export cookie 6746082393397974099 Jul 19 11:48:44 fir-md1-s1 kernel: LustreError: 31014:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 313 previous similar messages Jul 19 11:49:00 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.62@o2ib4 arrived at 1563562140 with bad export cookie 6746082393397974099 Jul 19 11:49:00 fir-md1-s1 kernel: LustreError: 21765:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 604 previous similar messages Jul 19 11:49:32 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563562082, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2fcbf61b00/0x5d9ee677a5451f55 lrc: 3/1,0 mode: --/PR res: [0x2c0024163:0x19838:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21003 timeout: 0 lvb_type: 0 Jul 19 11:49:32 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 19 11:49:32 fir-md1-s1 kernel: LustreError: 23042:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.62@o2ib4 arrived at 1563562172 with bad export cookie 6746082393397974099 Jul 19 11:49:32 fir-md1-s1 kernel: LustreError: 23042:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1246 previous similar messages Jul 19 11:50:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ef0748a0-58bc-3624-ed96-74860cd1e591 (at 10.8.0.66@o2ib6) reconnecting Jul 19 11:50:05 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 11:50:17 fir-md1-s1 kernel: Lustre: 23659:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f1bcbc500 x1638076510906000/t0(0) o101->f0a8fbb7-06c4-ed16-a94f-6cea310ceb29@10.8.0.82@o2ib6:22/0 lens 480/568 e 0 to 0 dl 1563562222 ref 2 fl Interpret:/0/0 rc 0/0 Jul 19 11:50:17 fir-md1-s1 kernel: Lustre: 23659:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 19 11:50:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.82@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f237a0eaac0/0x5d9ee677a5741e72 lrc: 3/0,0 mode: PW/PW res: [0x2c002c43d:0x419b:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.0.82@o2ib6 remote: 0xac353b34c4d0b148 expref: 29 pid: 97640 timeout: 2677281 lvb_type: 0 Jul 19 11:50:21 fir-md1-s1 kernel: LustreError: 21415:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1ecd050000 ns: mdt-fir-MDT0002_UUID lock: ffff8f0e05264a40/0x5d9ee677a5741eb1 lrc: 3/0,0 mode: PW/PW res: [0x2c002c43d:0x419b:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x50200400000020 nid: 10.8.0.82@o2ib6 remote: 0xac353b34c4d0b14f expref: 15 pid: 21415 timeout: 0 lvb_type: 0 Jul 19 11:50:21 fir-md1-s1 kernel: LustreError: 21415:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 19 11:51:19 fir-md1-s1 kernel: LNet: Service thread pid 23740 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 19 11:51:19 fir-md1-s1 kernel: Pid: 23740, comm: mdt02_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 19 11:51:19 fir-md1-s1 kernel: Call Trace: Jul 19 11:51:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 19 11:51:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_reint_unlink+0x1e7/0x1430 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 19 11:51:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 19 11:51:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 19 11:51:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 19 11:51:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 19 11:51:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 19 11:51:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 19 11:51:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563562279.23740 Jul 19 11:51:23 fir-md1-s1 kernel: Pid: 21003, comm: mdt02_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 19 11:51:23 fir-md1-s1 kernel: Call Trace: Jul 19 11:51:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 19 11:51:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 19 11:51:23 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 19 11:51:23 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 19 11:51:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 19 11:51:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 19 11:51:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 19 11:51:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 19 11:51:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 19 11:51:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563562283.21003 Jul 19 11:53:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 19 11:53:49 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 11:53:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 11:53:49 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 19 11:56:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 11:56:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 12:00:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ef0748a0-58bc-3624-ed96-74860cd1e591 (at 10.8.0.66@o2ib6) reconnecting Jul 19 12:00:14 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 19 12:00:59 fir-md1-s1 kernel: LNet: Service thread pid 23740 completed after 780.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 19 12:00:59 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 19 12:00:59 fir-md1-s1 kernel: LustreError: 23616:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2673ecf500 x1636738129586736/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 19 12:00:59 fir-md1-s1 kernel: LustreError: 23616:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 19 12:01:17 fir-md1-s1 kernel: Lustre: 21481:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1b45e88c00 x1638277812013584/t0(0) o101->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:22/0 lens 600/3264 e 1 to 0 dl 1563562882 ref 2 fl Interpret:/0/0 rc 0/0 Jul 19 12:01:17 fir-md1-s1 kernel: Lustre: 21481:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 19 12:01:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f33cfd2b180/0x5d9ee67791e3aecc lrc: 3/0,0 mode: PR/PR res: [0x2c0016eee:0x16ca4:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0xbb8f9eff541c5e07 expref: 708724 pid: 97641 timeout: 2677948 lvb_type: 0 Jul 19 12:01:47 fir-md1-s1 kernel: LustreError: 23750:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2697707800 x1636738129965264/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 19 12:03:17 fir-md1-s1 kernel: LustreError: 23750:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563562907, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2de8463cc0/0x5d9ee677a79d81a6 lrc: 3/0,1 mode: --/CW res: [0x2c0016eee:0x16f5a:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23750 timeout: 0 lvb_type: 0 Jul 19 12:03:17 fir-md1-s1 kernel: LustreError: 23750:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 19 12:03:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Jul 19 12:03:51 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 19 12:04:53 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:1s); client may timeout. req@ffff8f2abb768000 x1638277812212768/t353700133961(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:17/0 lens 488/424 e 0 to 0 dl 1563563092 ref 1 fl Complete:/0/0 rc 0/0 Jul 19 12:04:53 fir-md1-s1 kernel: LustreError: 21483:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1d137ab600 x1636738131105072/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 19 12:04:53 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jul 19 12:05:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 12:05:35 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 19 12:05:36 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2fbe326f00 x1638277813316288/t0(0) o101->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:11/0 lens 600/3264 e 0 to 0 dl 1563563141 ref 2 fl Interpret:/0/0 rc 0/0 Jul 19 12:05:36 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 19 12:05:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f9d945c40/0x5d9ee6779436630e lrc: 3/0,0 mode: PR/PR res: [0x2c0016eee:0x17248:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0xbb8f9eff54331715 expref: 397673 pid: 23695 timeout: 2678197 lvb_type: 0 Jul 19 12:05:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 19 12:06:38 fir-md1-s1 kernel: LustreError: 23652:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563563108, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f295a730b40/0x5d9ee677a808e905 lrc: 3/0,1 mode: --/EX res: [0x2c0016eee:0x17248:0x0].0x0 bits 0x3/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23652 timeout: 0 lvb_type: 0 Jul 19 12:06:38 fir-md1-s1 kernel: LustreError: 23652:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 19 12:08:22 fir-md1-s1 kernel: LustreError: 50445:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1828c2ec00 x1636738132272192/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 19 12:08:22 fir-md1-s1 kernel: LustreError: 50445:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Jul 19 12:10:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:10:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 12:10:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 12:10:18 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 12:10:48 fir-md1-s1 kernel: LustreError: 23665:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563563358, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2d93108900/0x5d9ee677a87492ac lrc: 3/0,1 mode: --/EX res: [0x2c0016ef8:0x3bc0:0x0].0x0 bits 0x3/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23665 timeout: 0 lvb_type: 0 Jul 19 12:10:48 fir-md1-s1 kernel: LustreError: 23665:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 19 12:14:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 12:14:30 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 19 12:15:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 12:15:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 12:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 12:20:54 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 12:24:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 12:24:46 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 12:26:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 12:26:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 12:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 12:31:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 12:31:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:31:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 12:35:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 12:35:03 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 19 12:39:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 12:39:29 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 19 12:41:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 12:41:23 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 12:45:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 12:45:05 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 12:46:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:46:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:49:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:50:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 12:50:58 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 19 12:52:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 12:52:38 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 12:54:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 12:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 12:55:40 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 19 12:56:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:00:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:01:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 13:01:01 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 19 13:02:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 13:02:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 13:06:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 13:06:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 13:12:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 13:12:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 13:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 13:13:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 13:16:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 13:16:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 13:17:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:17:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 13:21:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:22:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 13:22:49 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 13:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 13:24:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 13:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9ae1f0fe-eaee-c098-1dc4-7b1298c80249 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd823800, cur 1563567971 expire 1563567821 last 1563567744 Jul 19 13:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 13:26:22 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 13:32:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 13:32:52 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 13:36:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 13:36:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 19 13:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 13:37:15 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 13:39:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:40:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 13:43:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 19 13:43:27 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 13:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 13:47:38 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 13:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 13:47:38 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 19 13:53:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 13:53:37 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 13:57:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 13:57:45 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 19 13:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 13:58:28 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 19 14:05:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:05:31 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 19 14:07:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:07:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 14:07:51 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 19 14:08:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:08:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 14:08:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 14:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:15:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:15:53 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 19 14:18:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 14:18:01 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 19 14:18:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 14:18:55 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 14:26:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:26:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 19 14:27:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:28:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 14:28:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 19 14:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 14:29:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:29:42 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 19 14:30:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:36:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:36:30 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 14:37:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:38:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 14:38:27 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 19 14:38:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1f6469d3-d26f-d8c9-bf51-966fcd210811 (at 10.8.2.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ff6d0000, cur 1563572317 expire 1563572167 last 1563572090 Jul 19 14:38:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 14:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 14:40:04 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 14:41:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:42:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:43:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d5145b19-7e77-2465-cb06-19cf549382e1 (at 10.8.7.8@o2ib6) in 173 seconds. I think it's dead, and I am evicting it. exp ffff8f24e7ff7c00, cur 1563572583 expire 1563572433 last 1563572410 Jul 19 14:43:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 14:43:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15ad1a4400, cur 1563572637 expire 1563572487 last 1563572410 Jul 19 14:45:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:45:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6f33ca71-3722-0ecf-29e7-f68f6effb820 (at 10.8.27.10@o2ib6) in 199 seconds. I think it's dead, and I am evicting it. exp ffff8f260301c800, cur 1563572713 expire 1563572563 last 1563572514 Jul 19 14:45:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 19 14:45:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6f33ca71-3722-0ecf-29e7-f68f6effb820 (at 10.8.27.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f56c9400, cur 1563572741 expire 1563572591 last 1563572514 Jul 19 14:45:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 14:47:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:47:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 14:48:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 14:48:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 19 14:49:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 14:50:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 14:50:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 14:57:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 14:57:46 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 14:58:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 14:58:36 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 15:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 15:01:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 15:05:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:05:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 15:07:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 15:07:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 15:08:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 15:08:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:08:36 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 19 15:09:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:09:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 15:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 15:11:56 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 15:13:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:17:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 15:17:54 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 19 15:19:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 15:19:21 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 19 15:20:34 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:21:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:21:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 15:21:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 15:27:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 15:27:54 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 15:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 15:29:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 19 15:32:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 19 15:32:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 15:33:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6f9dd14c-d28d-0c0c-6110-1924893356e4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f322af69c00, cur 1563575596 expire 1563575446 last 1563575369 Jul 19 15:33:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 19 15:37:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:37:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 15:37:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 15:37:59 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 15:39:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 15:39:32 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 19 15:41:54 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:42:06 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 62c3a024-34de-fd61-6956-bb3675e9d145 (at 10.8.1.13@o2ib6) reconnecting Jul 19 15:42:13 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 15:42:14 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:42:51 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:43:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:43:32 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:46:59 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 15:47:39 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:49:13 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:49:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 15:49:37 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 19 15:50:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 15:50:38 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 15:52:19 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:52:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6) reconnecting Jul 19 15:52:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 15:53:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 15:53:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 15:57:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 633932d2-f74b-27df-f58a-40b99ddb5683 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdc42c00, cur 1563577042 expire 1563576892 last 1563576815 Jul 19 15:57:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 15:57:44 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 15:57:44 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 7 previous similar messages Jul 19 15:59:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 15:59:38 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 19 16:00:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:00:46 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 19 16:02:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 16:02:46 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 19 16:04:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 16:04:05 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 16:08:52 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 16:08:52 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 73 previous similar messages Jul 19 16:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 19 16:09:50 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 19 16:13:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 16:13:00 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 16:15:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:15:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 16:17:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 16:17:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 16:19:06 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 16:19:06 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Jul 19 16:21:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 16:21:11 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 16:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 16:23:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 19 16:25:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:25:41 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 16:28:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 90d886c9-febb-d72b-4230-195cbd7387f5 (at 10.8.21.21@o2ib6) in 168 seconds. I think it's dead, and I am evicting it. exp ffff8f1a7b33ec00, cur 1563578910 expire 1563578760 last 1563578742 Jul 19 16:28:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 16:29:08 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 19 16:29:08 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 223 previous similar messages Jul 19 16:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 90d886c9-febb-d72b-4230-195cbd7387f5 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1817b4cc00, cur 1563578969 expire 1563578819 last 1563578742 Jul 19 16:29:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 16:29:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 16:31:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f591177e-975b-faaf-66dc-6790347db0fe (at 10.8.8.27@o2ib6) Jul 19 16:31:18 fir-md1-s1 kernel: Lustre: Skipped 326 previous similar messages Jul 19 16:33:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client dd08efde-bd41-c023-e932-0440db96590e (at 10.8.27.2@o2ib6) reconnecting Jul 19 16:33:18 fir-md1-s1 kernel: Lustre: Skipped 341 previous similar messages Jul 19 16:35:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:35:47 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 19 16:37:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34533b2400, cur 1563579457 expire 1563579307 last 1563579230 Jul 19 16:37:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 19 16:39:42 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 19 16:39:42 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 133 previous similar messages Jul 19 16:41:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 16:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 16:42:13 fir-md1-s1 kernel: Lustre: Skipped 152 previous similar messages Jul 19 16:43:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 16:43:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 19 16:46:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:46:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 16:51:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 16:51:49 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 16:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 16:52:58 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 16:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 16:54:32 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 19 16:56:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 16:56:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 17:02:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:02:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 17:03:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 17:03:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 17:03:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cc57ad24-07f9-6270-9e45-e86bdff220e7 (at 10.8.2.27@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4502b95800, cur 1563581035 expire 1563580885 last 1563580808 Jul 19 17:04:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 19 17:04:54 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 17:06:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 17:06:24 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 17:10:45 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563581438/real 1563581438] req@ffff8f24f5b13000 x1636738693888064/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563581445 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 19 17:10:45 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 19 17:13:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 17:13:41 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 19 17:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 17:15:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 17:16:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:16:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 17:17:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 17:17:15 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 19 17:18:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6da9dbbb-6f9e-13c3-beda-be1a9c5c5e6f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0bad2dec00, cur 1563581934 expire 1563581784 last 1563581707 Jul 19 17:18:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 17:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 17:24:14 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 17:25:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 17:25:56 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 17:27:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 17:27:17 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 17:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 17:35:14 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 19 17:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 17:36:12 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 17:37:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:37:47 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 17:38:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 17:38:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 17:40:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:45:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 17:45:30 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 17:46:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 17:46:19 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 17:48:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 17:48:19 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 19 17:48:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7b862030-a9dc-ab17-036f-c196eec9160d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f448d294000, cur 1563583736 expire 1563583586 last 1563583509 Jul 19 17:48:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 17:49:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:49:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 17:55:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 17:55:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 17:55:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 17:55:45 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 19 17:56:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 17:56:30 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 18:02:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 18:02:03 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 18:03:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee1f3cc00, cur 1563584589 expire 1563584439 last 1563584362 Jul 19 18:03:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 18:04:25 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f83a4742-27bc-0f79-92b4-6c1b114fbebe (at 10.8.21.21@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff8f3d2522e800, cur 1563584665 expire 1563584515 last 1563584494 Jul 19 18:05:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client deb8653a-9a59-739c-c950-d6f80e1fee7d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34b4b56800, cur 1563584721 expire 1563584571 last 1563584494 Jul 19 18:06:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 18:06:11 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 19 18:06:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 18:06:12 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 19 18:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 18:07:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 18:12:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 18:12:13 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 18:16:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Jul 19 18:16:13 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 19 18:16:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33669d8e-0a79-1af7-2836-9b07953f4f50 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521881c00, cur 1563585374 expire 1563585224 last 1563585147 Jul 19 18:16:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 19 18:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 18:17:22 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 18:19:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 18:19:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 18:23:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 18:23:02 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 18:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 18:26:51 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 19 18:27:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 18:27:27 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 19 18:30:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 18:30:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 18:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 570258f0-3726-ee23-f57d-853a5557d2ec (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e0078ac00, cur 1563586291 expire 1563586141 last 1563586064 Jul 19 18:31:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 18:33:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 18:33:10 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 19 18:36:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 18:36:56 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 18:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 18:37:34 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 18:42:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 18:42:03 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 18:46:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 18:46:00 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 18:47:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 18:47:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 18:47:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 18:47:53 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 19 18:53:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 18:53:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 18:56:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 18:56:08 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 19 18:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 18:57:11 fir-md1-s1 kernel: Lustre: Skipped 120 previous similar messages Jul 19 18:57:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0c920b14-78d7-437b-0dc2-c06c0ff953ec (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0aaa465000, cur 1563587854 expire 1563587704 last 1563587627 Jul 19 18:57:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 18:57:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 18:57:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 19 19:06:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 19:06:37 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 19 19:06:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0f939f3b-d46b-abdd-a97b-14ae391181f5 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33bca47000, cur 1563588403 expire 1563588253 last 1563588176 Jul 19 19:06:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 19:07:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 19:07:12 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 19 19:07:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 19:07:22 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 19 19:08:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 19:08:05 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 19 19:11:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 044e808a-e0c8-98d2-d381-81fbc4657aeb (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2deaf98400, cur 1563588687 expire 1563588537 last 1563588460 Jul 19 19:11:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 19:15:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c8c6d380-5672-beca-90c0-967f222494f1 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f6d1e1c00, cur 1563588931 expire 1563588781 last 1563588704 Jul 19 19:15:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 19 19:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 19:17:15 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 19 19:18:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 19:18:14 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 19 19:18:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 19:18:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 19:19:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 19:19:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 19 19:27:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 19:27:16 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 19:27:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 70d26392-3254-86c5-35bc-03d5eb15133c (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f288c184800, cur 1563589637 expire 1563589487 last 1563589410 Jul 19 19:27:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 19 19:29:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 19:29:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 19:29:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 19:29:42 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 19:32:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 19:32:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 19:33:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ece9a09c-f33c-1c56-6b42-75ad709a0a32 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2af2684000, cur 1563590020 expire 1563589870 last 1563589793 Jul 19 19:33:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 19 19:34:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 191 seconds. I think it's dead, and I am evicting it. exp ffff8f1a3972a400, cur 1563590096 expire 1563589946 last 1563589905 Jul 19 19:34:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 19:37:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 19:37:17 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 19:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 19:39:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 19 19:40:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 19 19:40:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 19 19:47:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 19:47:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 19:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 19:47:35 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 19:49:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 19:49:27 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 19:50:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 19:50:36 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 19 19:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 19:57:37 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 19 20:00:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 20:00:13 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 20:01:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 20:01:37 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 19 20:04:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2095915c00, cur 1563591897 expire 1563591747 last 1563591670 Jul 19 20:06:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 20:06:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 20:07:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:07:45 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 19 20:09:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 54cc1e48-49ff-70c2-7d9b-bc3c08e5eb90 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44ce6acc00, cur 1563592146 expire 1563591996 last 1563591919 Jul 19 20:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 20:10:14 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 19 20:11:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 19 20:11:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 20:16:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 20:17:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:17:53 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 20:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 20:20:28 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 19 20:22:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 19 20:22:14 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 20:28:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:28:04 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 20:32:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 20:32:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 19 20:32:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 20:32:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 20:38:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:38:07 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 20:42:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 20:42:22 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 20:42:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e19526ba-6234-aa28-444a-ad61b17fff75 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a083b8800, cur 1563594165 expire 1563594015 last 1563593938 Jul 19 20:42:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 20:43:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 20:43:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 20:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:48:21 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 19 20:52:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 20:52:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 19 20:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 20:53:38 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 19 20:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 51fe2f63-10c7-7630-b83e-66803884dcdc (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e96b18400, cur 1563595018 expire 1563594868 last 1563594791 Jul 19 20:56:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 20:58:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 20:58:37 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 19 21:03:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 21:03:46 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 19 21:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 21:04:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 21:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17b9633a-f84e-22b0-5e9c-55e6447d9e30 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252295a400, cur 1563595456 expire 1563595306 last 1563595229 Jul 19 21:04:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 21:08:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:08:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 21:08:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 21:08:38 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 19 21:13:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3d262a8-72d9-acef-80df-25fe1d311bc8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec033c800, cur 1563596014 expire 1563595864 last 1563595787 Jul 19 21:13:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 21:14:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 21:14:16 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 21:15:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 21:15:37 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 19 21:16:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:16:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 21:19:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 19 21:19:06 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 19 21:24:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:24:39 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 21:24:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 21:24:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 21:27:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 21:27:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 19 21:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 21:29:35 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 19 21:30:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:30:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 21:34:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 21:34:57 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 19 21:39:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 19 21:39:02 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 21:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 21:39:45 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 19 21:40:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:40:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 21:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 21:45:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 21:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e3643bab-5fa9-2900-e72f-07a9523f9dcc (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3475847000, cur 1563598058 expire 1563597908 last 1563597831 Jul 19 21:47:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 19 21:49:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 21:49:05 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 19 21:50:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 21:50:39 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 19 21:53:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 21:53:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 21:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 21:55:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 19 22:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1e29b0a1-dec1-1632-321f-6218afb3ae03 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21ac336000, cur 1563598805 expire 1563598655 last 1563598578 Jul 19 22:00:05 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 19 22:00:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.12@o2ib6, removing former export from same NID Jul 19 22:00:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 22:00:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 22:00:39 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 22:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 22:06:13 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 19 22:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 13a906e5-e289-135a-98eb-8d511e1e2af3 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2479137000, cur 1563599188 expire 1563599038 last 1563598961 Jul 19 22:06:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 22:10:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 22:10:08 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 22:10:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 22:10:33 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 19 22:10:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 22:10:40 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 19 22:16:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 19 22:16:24 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 19 22:20:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 22:20:08 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 19 22:20:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 22:20:35 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 22:20:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 22:20:46 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 19 22:27:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 22:27:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 19 22:31:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 22:31:03 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 19 22:31:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 22:31:03 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 19 22:32:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 22:32:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 22:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fe6bcd11-20b3-9fda-e8fa-0241b6f35235 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f267736f800, cur 1563601021 expire 1563600871 last 1563600794 Jul 19 22:37:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 22:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 22:37:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 19 22:41:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 22:41:06 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 22:43:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 22:43:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 19 22:44:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 22:44:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 22:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 22:47:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 22:51:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 22:51:28 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 19 22:53:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 19 22:53:34 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 19 22:56:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 22:56:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 19 22:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ceedc62c-f055-2057-2665-92cf0aa332ae (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2502cca400, cur 1563602249 expire 1563602099 last 1563602022 Jul 19 22:57:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 22:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 19 22:57:37 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 19 23:01:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 23:01:36 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 19 23:04:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 23:04:29 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 19 23:06:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 23:06:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 23:07:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 23:07:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 19 23:09:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client db3aa326-ce8a-ee62-bf32-a8a2073495bf (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0e3b043800, cur 1563602946 expire 1563602796 last 1563602719 Jul 19 23:09:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 23:11:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 23:11:38 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 19 23:15:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 19 23:15:30 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 19 23:17:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 19 23:17:56 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 19 23:18:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 23:18:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 23:21:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 19 23:21:46 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 19 23:25:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a3736407-86dc-679e-f268-b1db34b58dd6 (at 10.8.23.12@o2ib6) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f267e3ff800, cur 1563603933 expire 1563603783 last 1563603726 Jul 19 23:25:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 23:25:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 23:25:33 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 19 23:25:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a3736407-86dc-679e-f268-b1db34b58dd6 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f290688c400, cur 1563603953 expire 1563603803 last 1563603726 Jul 19 23:25:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 19 23:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 23:30:44 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 19 23:31:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 23:31:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 19 23:31:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 23:31:48 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 19 23:35:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 23:35:34 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 19 23:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 23:41:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 19 23:42:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 19 23:42:02 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 19 23:42:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 23:42:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 19 23:43:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ef893c27-9690-3fa0-5df5-0f04caafdaa9 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2632bc9400, cur 1563604984 expire 1563604834 last 1563604757 Jul 19 23:43:04 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 19 23:45:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 19 23:45:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 19 23:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ae8f2d000, cur 1563605200 expire 1563605050 last 1563604973 Jul 19 23:46:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 19 23:48:12 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 19 23:50:02 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 19 23:50:02 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages Jul 19 23:50:10 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 19 23:50:10 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 19 23:50:25 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 19 23:52:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 19 23:52:11 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 19 23:53:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 19 23:53:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 19 23:56:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 19 23:56:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 19 23:57:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 19 23:57:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 19 23:59:45 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 20 00:01:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ac1f8ed7-ebfb-3e98-9aa9-f016992cd64b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2219e92400, cur 1563606114 expire 1563605964 last 1563605887 Jul 20 00:02:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 00:02:17 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 20 00:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 00:03:11 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 00:06:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:06:50 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 20 00:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 866964dd-2818-0da3-7970-e47bfeb7a77f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f325fe1e400, cur 1563606616 expire 1563606466 last 1563606389 Jul 20 00:10:16 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 20 00:12:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 00:12:24 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 20 00:14:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 00:14:25 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 00:15:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 00:17:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 14b19d24-49aa-e1cf-3634-1d5dcf4d34cf (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b6f57b000, cur 1563607050 expire 1563606900 last 1563606823 Jul 20 00:17:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 00:19:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:19:05 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 20 00:22:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 00:22:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 00:24:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 00:24:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 00:29:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:29:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 20 00:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b4bff480-0f0b-e0d6-fab7-0c57a056a6eb (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f398ffa0c00, cur 1563607898 expire 1563607748 last 1563607671 Jul 20 00:31:38 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 20 00:32:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 00:32:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 00:34:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 00:35:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 37a5adf8-7d80-4241-9fba-33f17d4628f7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ea853d000, cur 1563608159 expire 1563608009 last 1563607932 Jul 20 00:35:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 00:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 00:36:20 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 20 00:39:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:39:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 00:42:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 00:42:56 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 20 00:47:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 00:47:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 20 00:49:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:49:18 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 00:50:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 00:50:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 00:51:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 63b72caa-1aca-21e7-c896-b7a88baed8f9 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23e7fba000, cur 1563609077 expire 1563608927 last 1563608850 Jul 20 00:51:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 00:53:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 00:53:09 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 00:57:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 00:57:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 00:59:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 00:59:24 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 20 01:00:07 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 01:00:07 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jul 20 01:03:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 01:03:17 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 01:05:56 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 01:08:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b982e10a-b572-102a-4542-7bf1b80ed937 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ab36ec800, cur 1563610109 expire 1563609959 last 1563609882 Jul 20 01:08:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 01:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 01:08:44 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 20 01:10:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 01:10:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 01:13:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 01:13:31 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 20 01:19:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 01:19:06 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 01:20:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 01:20:30 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 20 01:22:54 fir-md1-s1 kernel: Lustre: 23571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 01:23:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 01:23:37 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 20 01:25:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 01:25:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 01:28:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1b88b104-77b9-2949-3028-409a38cdfc9a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f290c7e1400, cur 1563611284 expire 1563611134 last 1563611057 Jul 20 01:28:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 20 01:29:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 01:29:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 01:30:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 01:30:46 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 20 01:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 01:33:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 01:33:44 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 20 01:39:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 01:39:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 01:40:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 01:40:15 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 20 01:40:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 20 01:40:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 01:43:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 092c2fa8-e35a-4eb9-f97f-7a5ff117f12f (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e1d7c5c00, cur 1563612197 expire 1563612047 last 1563611970 Jul 20 01:43:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 01:43:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 01:43:53 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 20 01:44:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client acfc5067-0178-c224-9d5b-3340c584534b (at 10.8.23.14@o2ib6) in 210 seconds. I think it's dead, and I am evicting it. exp ffff8f36f3b50400, cur 1563612273 expire 1563612123 last 1563612063 Jul 20 01:44:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 01:44:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 01:44:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 01:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client acfc5067-0178-c224-9d5b-3340c584534b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36f3b57000, cur 1563612290 expire 1563612140 last 1563612063 Jul 20 01:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 01:50:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 20 01:51:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 01:51:21 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 01:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 01:54:02 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 20 01:55:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 01:55:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 02:00:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 02:00:49 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 20 02:01:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 02:01:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 02:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 02:06:04 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 20 02:07:44 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 02:07:44 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 02:08:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 02:08:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 02:11:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 02:11:00 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 20 02:11:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 20 02:11:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 02:16:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 02:16:52 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 20 02:17:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96f6386b-c7f5-ccb6-2032-17ed618d03ba (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bb24c2000, cur 1563614239 expire 1563614089 last 1563614012 Jul 20 02:17:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 02:20:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 02:20:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 02:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 02:22:05 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 20 02:23:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 02:23:25 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 02:26:56 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 02:26:56 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 20 02:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 02:27:02 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 20 02:32:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 02:32:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 02:34:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 02:34:02 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 20 02:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ed128d30-f8e8-6d93-1d6e-30e7e74eb718 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f287fb03400, cur 1563615280 expire 1563615130 last 1563615053 Jul 20 02:34:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 20 02:35:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 02:35:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 02:37:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 02:37:12 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 02:42:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 02:42:48 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 20 02:46:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 02:46:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 20 02:47:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 02:47:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 02:53:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 02:53:30 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 02:55:49 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 02:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 02:57:24 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 20 02:58:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 02:58:21 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 20 02:58:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 02:58:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 03:01:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 03:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 03:04:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 03:07:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 03:07:40 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 20 03:10:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1bc2b3bd-04a2-fda6-ac1c-82ec609882c3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff3e0d400, cur 1563617408 expire 1563617258 last 1563617181 Jul 20 03:10:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 20 03:11:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 03:11:53 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 20 03:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 03:14:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 20 03:17:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 03:17:54 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 20 03:22:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 03:22:01 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 03:24:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 03:24:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 20 03:25:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b4a56f3d-4024-bf3b-5f41-822fbd1ee229 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27a14a7000, cur 1563618356 expire 1563618206 last 1563618129 Jul 20 03:25:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 03:27:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f2980a0d-3bca-5312-f402-7fa0fb1e1646 (at 10.8.23.14@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff8f2f66c20800, cur 1563618432 expire 1563618282 last 1563618215 Jul 20 03:27:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 03:27:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 26e38d76-f5b1-242f-8cef-6b321f69f794 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3983fe9400, cur 1563618442 expire 1563618292 last 1563618215 Jul 20 03:27:49 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 03:27:49 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 25 previous similar messages Jul 20 03:28:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 03:28:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 03:32:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 03:32:57 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 03:35:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 03:35:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 03:38:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 03:38:01 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 20 03:44:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 03:44:04 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 03:45:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 03:45:08 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 20 03:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 03:48:40 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 03:48:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 03:49:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 03:54:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 03:54:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 03:56:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 03:56:12 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 20 03:57:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 03:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 03:58:51 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 20 04:04:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:04:30 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 04:05:12 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:05:12 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Jul 20 04:05:55 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:05:55 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 20 04:06:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 04:06:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 04:08:26 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:08:26 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jul 20 04:08:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 04:08:51 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 20 04:09:01 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:11:47 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:12:07 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:12:07 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jul 20 04:12:49 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:12:49 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jul 20 04:14:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:15:08 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:15:08 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jul 20 04:15:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:15:51 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 20 04:18:14 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:18:14 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 20 04:18:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2a764af7-d16e-8f90-4af1-567d54310c4a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fc542ec00, cur 1563621499 expire 1563621349 last 1563621272 Jul 20 04:18:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 04:18:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:18:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 04:18:57 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 20 04:19:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 04:19:25 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 20 04:20:25 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:20:25 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages Jul 20 04:25:39 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:25:39 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages Jul 20 04:27:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:27:42 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 04:29:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 04:29:03 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 04:29:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 04:29:45 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 20 04:34:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:35:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:37:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:37:47 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 04:39:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 04:39:12 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 20 04:39:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:40:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 04:40:08 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 20 04:40:51 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 04:40:51 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jul 20 04:48:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:48:12 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 04:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 04:49:35 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 20 04:50:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 04:50:55 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 04:53:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:54:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1a5d7d54-df84-6274-31bb-ed7e9811ef84 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23f7f7dc00, cur 1563623674 expire 1563623524 last 1563623447 Jul 20 04:54:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 04:55:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 165 seconds. I think it's dead, and I am evicting it. exp ffff8f2e9a19fc00, cur 1563623750 expire 1563623600 last 1563623585 Jul 20 04:55:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 04:57:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 04:58:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 04:58:20 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 20 05:01:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 05:01:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 05:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 05:01:13 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 20 05:03:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:08:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 05:08:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 05:08:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9a1f96bd-956d-5aec-d489-d3a8ec027e90 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef0e4a800, cur 1563624600 expire 1563624450 last 1563624373 Jul 20 05:11:48 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:11:48 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 26 previous similar messages Jul 20 05:13:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 05:13:02 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 20 05:13:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 05:13:02 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 20 05:16:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:18:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 05:18:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 05:18:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:21:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:23:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 05:23:05 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 20 05:23:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 05:23:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 20 05:23:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:26:15 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:26:15 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jul 20 05:28:36 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:28:36 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Jul 20 05:28:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 05:28:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 05:30:49 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:33:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 05:33:10 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 20 05:34:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 05:34:47 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 20 05:35:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:35:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 05:38:48 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:38:48 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jul 20 05:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 05:38:52 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 20 05:41:28 fir-md1-s1 kernel: Lustre: 23576:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:41:28 fir-md1-s1 kernel: Lustre: 23576:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Jul 20 05:43:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 05:43:12 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 20 05:44:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:44:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 05:45:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 05:45:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 05:46:55 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 05:46:55 fir-md1-s1 kernel: Lustre: 23605:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 20 05:49:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 05:49:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 20 05:53:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 05:53:17 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 20 05:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 05:55:28 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 05:55:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 05:55:42 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 05:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b0819e5b-2d1c-9ed8-38cb-c201ec5569cf (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d958e0c00, cur 1563627418 expire 1563627268 last 1563627191 Jul 20 05:56:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 05:59:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 05:59:26 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 06:01:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 31912972-aedb-5c03-7a6b-e89df5daa8f2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff67b7000, cur 1563627677 expire 1563627527 last 1563627450 Jul 20 06:01:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 06:03:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 06:03:23 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 20 06:05:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 06:05:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 06:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 06:06:04 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 20 06:10:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 06:10:28 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 20 06:11:19 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:11:19 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 56 previous similar messages Jul 20 06:13:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 06:13:23 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 20 06:16:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 06:16:13 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 20 06:17:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7a9a916b-9d7e-8a9c-34ef-61dcd49d1152 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2658935000, cur 1563628671 expire 1563628521 last 1563628444 Jul 20 06:17:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 06:22:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 06:22:34 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 06:23:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 06:23:25 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 20 06:26:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 06:26:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 20 06:30:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f38c5bc0400, cur 1563629402 expire 1563629252 last 1563629175 Jul 20 06:30:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 06:31:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 06:31:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 06:32:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 06:32:35 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 20 06:33:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 06:33:29 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 06:34:25 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:34:25 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 20 06:35:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 06:36:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 06:36:43 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 06:38:00 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:38:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 06:43:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 06:43:15 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 06:43:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 06:43:35 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 20 06:45:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 06:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 06:47:01 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 06:47:12 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:47:12 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jul 20 06:48:41 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:48:41 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 20 06:49:55 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:49:55 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jul 20 06:53:22 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:53:22 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jul 20 06:54:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 06:54:07 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 06:56:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 06:56:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 06:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 06:57:24 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 06:59:26 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 06:59:26 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 20 07:04:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 07:04:17 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 20 07:06:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 07:06:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 07:06:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 07:06:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 07:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 07:07:40 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 07:09:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 07:11:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4c7ee862-c33b-a5ba-15f2-3865a00f8942 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f6bac3c00, cur 1563631895 expire 1563631745 last 1563631668 Jul 20 07:14:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 07:14:45 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 20 07:16:48 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:16:48 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jul 20 07:17:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 20 07:17:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 20 07:17:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 07:17:57 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 07:18:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31fba77000, cur 1563632280 expire 1563632130 last 1563632053 Jul 20 07:18:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 07:20:05 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:20:05 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 07:20:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 07:21:34 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:21:34 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 07:24:10 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:24:10 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 126 previous similar messages Jul 20 07:24:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 07:24:51 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 20 07:27:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 89e28bdb-10a9-5bab-ebab-6b4b52c4bef0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f287773c800, cur 1563632837 expire 1563632687 last 1563632610 Jul 20 07:28:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 07:28:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 20 07:28:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 07:28:40 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 20 07:29:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 07:29:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 07:29:19 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:29:19 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 162 previous similar messages Jul 20 07:34:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 07:34:55 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 20 07:39:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 07:39:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 07:39:26 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:39:26 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 342 previous similar messages Jul 20 07:39:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 07:39:46 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 07:40:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 07:40:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 07:45:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 07:45:18 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 07:49:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 07:49:45 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 07:49:47 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 07:49:47 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 165 previous similar messages Jul 20 07:50:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 07:50:25 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 07:55:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 07:55:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 20 08:00:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 08:00:01 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 20 08:01:14 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 08:01:14 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 522 previous similar messages Jul 20 08:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:01:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 08:05:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 08:05:35 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 20 08:06:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:06:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 08:08:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 08:10:13 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 20 08:12:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:12:37 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 08:13:28 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 08:13:28 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Jul 20 08:15:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:15:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 08:15:53 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 20 08:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 08:20:22 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 08:22:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:22:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:22:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 08:26:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 08:26:15 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 20 08:26:51 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 08:26:51 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 47 previous similar messages Jul 20 08:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 08:30:34 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 08:34:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:34:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 08:36:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 08:36:47 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 20 08:38:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:38:00 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 08:40:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 08:40:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 08:47:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 08:47:05 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 08:48:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 08:48:10 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 20 08:48:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:48:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 08:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 08:50:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 08:57:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 08:57:06 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 20 08:59:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 08:59:29 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 09:00:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 09:00:58 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 09:01:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:01:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 09:07:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 09:07:08 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 20 09:09:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 09:09:53 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 09:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 09:11:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 09:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:11:43 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 20 09:17:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 09:17:54 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 20 09:20:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 09:20:25 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 09:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 09:21:22 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 09:23:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:23:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 09:27:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 09:27:55 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 20 09:30:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 20 09:30:40 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 20 09:31:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 09:31:36 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 09:37:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:37:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 09:38:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 09:38:25 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 20 09:41:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 09:41:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 09:42:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 20 09:42:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 09:47:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:47:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 09:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 09:48:34 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 20 09:51:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 09:51:46 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 09:55:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 09:55:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 20 09:57:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 09:57:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 09:58:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 09:58:59 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 20 10:01:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 10:01:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 10:06:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:06:13 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 10:07:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 10:07:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 10:08:27 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:08:27 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 10:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 10:08:59 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 20 10:12:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 10:12:31 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 20 10:16:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:16:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 10:17:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 10:17:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 10:19:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 10:19:20 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 20 10:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 10:22:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 10:25:28 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:25:28 fir-md1-s1 kernel: Lustre: 23600:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jul 20 10:26:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:26:51 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 20 10:27:35 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:27:35 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 20 10:29:35 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:29:35 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 20 10:29:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 10:29:41 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 20 10:30:51 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:30:51 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages Jul 20 10:32:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 10:32:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 10:34:33 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:34:33 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages Jul 20 10:35:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 10:35:03 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 20 10:36:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:36:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 10:37:35 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:37:35 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 20 10:39:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 10:39:53 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 10:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 10:43:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 10:43:02 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:43:02 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages Jul 20 10:45:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 10:45:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 10:47:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:47:19 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 10:50:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 10:50:07 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 20 10:50:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 74373387-2786-29d5-b71b-eb992ffb019b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3cc6c00, cur 1563645021 expire 1563644871 last 1563644794 Jul 20 10:50:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 10:53:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 10:53:14 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 10:54:50 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 10:54:50 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 74 previous similar messages Jul 20 10:57:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 10:57:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 11:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 11:00:11 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 20 11:03:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 11:03:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 11:04:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 11:04:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 11:05:54 fir-md1-s1 kernel: Lustre: 10502:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 11:05:54 fir-md1-s1 kernel: Lustre: 10502:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 69 previous similar messages Jul 20 11:10:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 11:10:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 20 11:10:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 11:10:13 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 20 11:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 11:13:37 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 11:16:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 11:16:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 11:17:10 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 11:17:10 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 90 previous similar messages Jul 20 11:20:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 11:20:24 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 11:20:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 11:20:24 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 20 11:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 11:23:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 11:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 049292f9-298c-3fa8-99f8-113a1854bf2e (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16dc66fc00, cur 1563647248 expire 1563647098 last 1563647021 Jul 20 11:27:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 11:28:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 11:28:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 11:30:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 11:30:49 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 20 11:30:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 11:30:50 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 20 11:31:23 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aec6798f-7efd-7343-c258-45d40bc1c2f9 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4003ab2400, cur 1563647483 expire 1563647333 last 1563647256 Jul 20 11:31:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 11:33:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 11:33:54 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 11:38:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 11:38:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 11:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 11:41:11 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 20 11:41:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 11:41:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 11:44:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 11:44:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 11:48:29 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 11:48:29 fir-md1-s1 kernel: Lustre: 10197:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Jul 20 11:51:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 11:51:12 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 20 11:51:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 11:51:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 11:52:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 11:52:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 11:53:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 069fd880-640d-a841-7acf-ed3fd83eceae (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2107771800, cur 1563648793 expire 1563648643 last 1563648566 Jul 20 11:53:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 11:54:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 11:54:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 20 12:01:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 12:01:15 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 20 12:01:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:01:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 12:04:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 12:04:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 12:04:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 12:04:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 12:10:05 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:10:27 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:10:27 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 20 12:10:48 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:10:48 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 12:11:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 12:11:15 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 20 12:12:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:12:17 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 12:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 12:14:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 12:15:17 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:15:17 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 71 previous similar messages Jul 20 12:17:06 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:17:06 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 30 previous similar messages Jul 20 12:20:26 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:20:26 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Jul 20 12:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 12:21:22 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 20 12:21:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 12:21:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 12:23:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b4700924-89d4-e1b7-14af-57aadcea4e34 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3471bc2000, cur 1563650613 expire 1563650463 last 1563650386 Jul 20 12:23:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 12:24:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:24:30 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 20 12:24:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 12:24:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 12:29:30 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:29:30 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jul 20 12:31:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 12:31:30 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 20 12:34:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:34:51 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 20 12:35:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 12:35:07 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 12:39:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 12:39:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 12:40:27 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:40:27 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 25 previous similar messages Jul 20 12:41:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 12:41:47 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 20 12:45:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:45:12 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 20 12:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 12:45:27 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 12:50:37 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 12:50:37 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18245 previous similar messages Jul 20 12:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 12:51:48 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 20 12:55:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 12:55:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 12:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3063e49-4b60-12f0-a4d1-6667b3cdfe1d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f348e2a2000, cur 1563652587 expire 1563652437 last 1563652360 Jul 20 12:56:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 12:57:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 12:57:24 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 12:59:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 12:59:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 13:01:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 508ab9d6-8cda-ad35-66e7-381eefa81e0e (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2405b69800, cur 1563652879 expire 1563652729 last 1563652652 Jul 20 13:01:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:01:27 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 13:01:27 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 94 previous similar messages Jul 20 13:01:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jul 20 13:01:53 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 20 13:05:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 13:05:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 13:06:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 933fb186-1c72-7e27-bcb4-2776992a5416 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb9703400, cur 1563653166 expire 1563653016 last 1563652939 Jul 20 13:06:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:10:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 13:10:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 13:11:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 13:11:33 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 20 13:11:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 13:11:57 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 20 13:14:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c7f49fa2-b31d-c6fb-5d95-fc8b5b5cccfc (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32479af800, cur 1563653697 expire 1563653547 last 1563653470 Jul 20 13:14:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:15:33 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 13:15:33 fir-md1-s1 kernel: Lustre: 23578:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 39 previous similar messages Jul 20 13:16:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 13:16:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 13:20:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 11c6f016-ba39-22cf-54cb-416a7eee7671 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3077287400, cur 1563654025 expire 1563653875 last 1563653798 Jul 20 13:20:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:21:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 13:21:33 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 13:21:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 13:21:57 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 20 13:22:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 13:22:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 13:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21f8f23c00, cur 1563654329 expire 1563654179 last 1563654102 Jul 20 13:25:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:26:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 13:26:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 13:30:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ac00a4c-b293-f84d-b749-2893f354f441 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cd8eec400, cur 1563654604 expire 1563654454 last 1563654377 Jul 20 13:30:48 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 13:30:48 fir-md1-s1 kernel: Lustre: 21421:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 20 13:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 155 seconds. I think it's dead, and I am evicting it. exp ffff8f1761ffe800, cur 1563654680 expire 1563654530 last 1563654525 Jul 20 13:31:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:31:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 13:31:49 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 20 13:32:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 13:32:00 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 20 13:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 13:36:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 13:38:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d6c9a10-70f2-075b-a886-5e4898bdfc17 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a720ca400, cur 1563655101 expire 1563654951 last 1563654874 Jul 20 13:38:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 13:38:29 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 13:41:23 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 13:41:23 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 31 previous similar messages Jul 20 13:42:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 13:42:18 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 20 13:44:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 549863cb-4f79-82ff-c2c2-c63cf09684d0 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fb8344400, cur 1563655496 expire 1563655346 last 1563655269 Jul 20 13:44:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:45:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 13:45:38 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 20 13:46:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 13:46:42 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 20 13:51:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee5543c00, cur 1563655875 expire 1563655725 last 1563655648 Jul 20 13:51:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 13:51:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 13:51:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 13:52:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 13:52:19 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 20 13:56:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 13:56:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 20 13:58:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 13:58:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 14:01:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 14:01:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 14:02:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 14:02:28 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 20 14:05:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 772c5014-5dda-c90c-15b9-148d9e089e09 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f263ee42800, cur 1563656731 expire 1563656581 last 1563656504 Jul 20 14:07:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 14:07:35 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 14:08:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 14:08:27 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 14:12:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 14:12:36 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 14:17:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 14:17:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 14:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 14:17:41 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 14:18:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 14:18:31 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 14:22:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 14:22:37 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 20 14:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 14:27:47 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 14:28:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 14:28:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 14:29:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 14:29:20 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 20 14:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 14:32:50 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 20 14:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 14:38:08 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 14:40:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f908099-a42a-aca5-b6a3-1055d7781f65 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3480680c00, cur 1563658802 expire 1563658652 last 1563658575 Jul 20 14:40:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 14:40:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 20 14:40:06 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 20 14:40:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 14:40:37 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 14:42:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 14:42:51 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 20 14:46:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 1d9fd2a1-5fe2-38c8-a892-c55bd2fad38d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f452e30b800, cur 1563659182 expire 1563659032 last 1563658955 Jul 20 14:46:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 14:48:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 14:48:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 14:50:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 14:50:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 14:50:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 14:50:45 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 14:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 14:53:25 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 20 14:58:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 14:58:56 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 15:00:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5dbbaa44-e170-9331-1575-472fd5e1fc4d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b6f5c0000, cur 1563660016 expire 1563659866 last 1563659789 Jul 20 15:00:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 15:00:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 15:00:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 15:00:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 15:00:47 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 15:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 15:03:25 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 20 15:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3121a9f7-07d2-5074-3d50-72aefc664917 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24effd0000, cur 1563660288 expire 1563660138 last 1563660061 Jul 20 15:04:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 15:09:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 15:09:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 15:10:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 15:10:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 15:13:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 15:13:28 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 20 15:14:49 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 15:15:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 15:15:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 15:19:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 15:19:18 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 15:20:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 15:20:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 15:23:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 15:23:47 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 20 15:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 15:29:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 15:30:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 15:30:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 15:31:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 20 15:31:23 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 20 15:33:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 15:33:58 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 15:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 15:39:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 15:44:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 15:44:01 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 20 15:44:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 15:44:13 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 15:45:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5f79afa2-b004-4ba9-7586-e8605d3a4f9a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ecae1c00, cur 1563662740 expire 1563662590 last 1563662513 Jul 20 15:45:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 15:45:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 15:45:59 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 15:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5b436e9a-6740-d285-d3d0-01888f370fec (at 10.8.21.21@o2ib6) in 189 seconds. I think it's dead, and I am evicting it. exp ffff8f2eb8703c00, cur 1563662816 expire 1563662666 last 1563662627 Jul 20 15:46:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 15:47:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 079efd15-b251-73e4-d05f-0fcc3869c5fa (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f40f9fa2800, cur 1563662854 expire 1563662704 last 1563662627 Jul 20 15:47:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 15:49:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 15:49:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 15:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 15:54:04 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 20 15:55:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 15:55:59 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 15:56:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 14b6f5cb-9754-c5f5-5725-8f2f05c65961 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f284f39a400, cur 1563663393 expire 1563663243 last 1563663166 Jul 20 15:57:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 15:57:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 16:00:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 16:00:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 16:04:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 16:04:05 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 20 16:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5504ea99-8fcd-5c3d-6c9e-37144c68239d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb6b55c00, cur 1563663884 expire 1563663734 last 1563663657 Jul 20 16:04:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 16:06:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 16:06:05 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 16:10:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 16:10:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 20 16:12:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 16:12:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 16:14:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 16:14:34 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 20 16:16:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 16:16:06 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 20 16:16:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fcfacd0e-7fa6-9e7a-96f3-00befd709848 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501b06000, cur 1563664580 expire 1563664430 last 1563664353 Jul 20 16:16:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 16:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 16:21:22 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 16:24:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 84b62466-64ca-f1e5-5703-5b6716dcfc24 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2686660c00, cur 1563665054 expire 1563664904 last 1563664827 Jul 20 16:24:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 16:24:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 16:24:40 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 20 16:27:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 16:27:20 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 16:29:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 16:29:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 16:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 16:31:27 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 16:34:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 16:34:44 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 20 16:37:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 16:37:28 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 16:38:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4a27345f-d5a9-c659-ba21-6cc337281d8a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ea441000, cur 1563665927 expire 1563665777 last 1563665700 Jul 20 16:38:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 16:40:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 16:40:59 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 20 16:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 16:41:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 16:44:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 16:44:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 20 16:48:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 16:48:31 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 16:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 16:51:43 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 16:52:44 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563666757/real 1563666757] req@ffff8f1147843c00 x1636740065653664/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563666764 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 20 16:52:44 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 20 16:52:51 fir-md1-s1 kernel: Lustre: 23585:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563666764/real 1563666764] req@ffff8f1097bdec00 x1636740065654048/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563666771 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 20 16:52:51 fir-md1-s1 kernel: Lustre: 23585:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 20 16:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d81a5a7-f20a-27f2-cd0f-57cfd03854d0 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d93061000, cur 1563666772 expire 1563666622 last 1563666545 Jul 20 16:52:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 16:54:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 16:54:52 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 20 17:00:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 17:00:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 17:01:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 17:01:59 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 20 17:02:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 17:02:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 17:05:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 17:05:06 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 17:08:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 20951653-d81c-1d7c-d3da-c7c54ab36ceb (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f43f5a59c00, cur 1563667712 expire 1563667562 last 1563667485 Jul 20 17:08:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 17:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a60a4e48-7d8f-0f83-aa84-1b6379d9e66a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f220def5400, cur 1563667733 expire 1563667583 last 1563667506 Jul 20 17:08:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 17:11:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 17:11:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 17:12:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 17:12:05 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 20 17:12:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 17:12:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 17:15:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 17:15:07 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 20 17:22:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 17:22:42 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 17:23:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 17:23:36 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 17:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 17:25:38 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 20 17:31:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 17:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 17:32:49 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 17:33:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 20 17:33:43 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 17:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 17:35:40 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 20 17:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 17:43:02 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 17:44:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 17:44:27 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 20 17:46:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 17:46:09 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 20 17:46:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6b23b799-3bd6-5bdc-fece-ec792984f8f5 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f248efa1400, cur 1563670009 expire 1563669859 last 1563669782 Jul 20 17:48:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 17:48:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 17:53:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 17:53:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 17:54:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 17:54:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 17:56:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 17:56:12 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 20 18:03:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 18:03:19 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 18:04:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 18:04:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 18:04:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 18:04:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 18:06:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 18:06:27 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 20 18:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 18:13:29 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 18:14:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 18:14:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 18:16:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 18:16:45 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 20 18:17:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 18:17:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 20 18:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3bf4a6e7-cc91-edd7-1716-ccf1a401e63c (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26a42e9c00, cur 1563672081 expire 1563671931 last 1563671854 Jul 20 18:21:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 18:23:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 18:23:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 20 18:26:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 18:26:58 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 18:28:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 18:28:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 18:28:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 18:28:54 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 20 18:33:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 18:33:54 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 20 18:37:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 18:37:01 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 18:39:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 18:39:14 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 18:39:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 20 18:39:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 18:44:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 18:44:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 18:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 18:47:01 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 20 18:49:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 18:49:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 18:52:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 18:52:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 18:54:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 18:54:09 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 18:55:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bea01672-05dd-cafd-cd19-d569ba7a9fcb (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f8a3bc00, cur 1563674122 expire 1563673972 last 1563673895 Jul 20 18:55:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 18:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 18:57:11 fir-md1-s1 kernel: Lustre: Skipped 122 previous similar messages Jul 20 18:59:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 18:59:33 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 20 19:02:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 19:02:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 19:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 19:04:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 19:07:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 19:07:14 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 20 19:09:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 19:09:34 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 19:13:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 19:13:57 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 20 19:14:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 19:14:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 20 19:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 19:17:41 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 19:20:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 19:20:04 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 20 19:24:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 19:24:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 19:27:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 19:27:48 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 20 19:30:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 19:30:18 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 20 19:34:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 19:34:31 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 19:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 19:38:06 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 20 19:40:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 19:40:22 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 19:44:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 19:44:59 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 20 19:48:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 19:48:11 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 20 19:49:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 19:50:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2667887800, cur 1563677411 expire 1563677261 last 1563677184 Jul 20 19:50:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 19:50:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 19:50:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 19:54:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 19:54:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 19:54:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 19:54:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 19:57:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 19:57:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 19:58:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 19:58:31 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 20 20:01:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 20:01:08 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 20 20:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client df81f1c7-0e2e-5ac0-9be8-3a1d01207e94 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cc3202800, cur 1563678142 expire 1563677992 last 1563677915 Jul 20 20:05:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 20:05:09 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 20:08:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 20:08:32 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 20 20:08:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 20:08:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 20:11:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 20:11:48 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 20:13:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 744f273d-c125-7fee-1a9b-eb0c9d72d5ee (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f8f35bc00, cur 1563678818 expire 1563678668 last 1563678591 Jul 20 20:13:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 20:15:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 20:15:12 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 20 20:18:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 20:18:42 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 20 20:20:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 20:20:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 20:21:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 20:21:52 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 20:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9df27952-21df-c89a-3c57-b788e0a7ef86 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3313a87000, cur 1563679460 expire 1563679310 last 1563679233 Jul 20 20:24:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 20:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 20:25:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 20:28:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 20:28:43 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 20 20:31:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 20:31:54 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 20 20:33:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 20:33:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 20:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 20:35:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 20 20:38:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 57f70a50-fb29-a97d-9b5e-6dad5024ac77 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dff3aa000, cur 1563680312 expire 1563680162 last 1563680085 Jul 20 20:38:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 20:38:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 20:38:46 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 20 20:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 21ac2ccc-7d65-e2c3-3f0d-ab90af03b930 (at 10.8.23.14@o2ib6) in 168 seconds. I think it's dead, and I am evicting it. exp ffff8f2d5e489000, cur 1563680388 expire 1563680238 last 1563680220 Jul 20 20:39:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 20:40:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 9c7c42ed-8017-acaa-ce12-1eb59779b50e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4410f5ec00, cur 1563680447 expire 1563680297 last 1563680220 Jul 20 20:40:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 20:42:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 20:42:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 20:42:59 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 20:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 20:45:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 20:48:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 20:48:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 20:48:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 20:48:47 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 20 20:52:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 20:52:02 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 20:55:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 20:55:49 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 20:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 20:58:51 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 20 21:03:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:03:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 20 21:04:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:04:16 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 21:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 21:05:57 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 21:09:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 21:09:06 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 20 21:13:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:13:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 21:14:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:14:24 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 21:15:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d471fdef-9185-5540-2869-a84e2ffe31c1 (at 10.8.21.21@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff8f168b4a3400, cur 1563682542 expire 1563682392 last 1563682319 Jul 20 21:15:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d471fdef-9185-5540-2869-a84e2ffe31c1 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ec956000, cur 1563682546 expire 1563682396 last 1563682319 Jul 20 21:16:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 21:16:23 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 21:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client be5c3cb1-dd8e-b386-76b4-025a57605f3c (at 10.8.23.14@o2ib6) in 178 seconds. I think it's dead, and I am evicting it. exp ffff8f21c8617c00, cur 1563682618 expire 1563682468 last 1563682440 Jul 20 21:16:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 20 21:17:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client be5c3cb1-dd8e-b386-76b4-025a57605f3c (at 10.8.23.14@o2ib6) in 182 seconds. I think it's dead, and I am evicting it. exp ffff8f1925937000, cur 1563682622 expire 1563682472 last 1563682440 Jul 20 21:17:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 59336322-89a9-0056-a39e-6b44fa959187 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a30b5a800, cur 1563682667 expire 1563682517 last 1563682440 Jul 20 21:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 20 21:19:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 20 21:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9841752a-f792-7ad6-babf-a3f66fa1763f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c2558ec00, cur 1563682904 expire 1563682754 last 1563682677 Jul 20 21:25:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:25:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 21:26:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:26:30 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 20 21:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 21:26:51 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 21:29:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 21:29:16 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 20 21:36:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:36:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 20 21:37:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 20 21:37:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 21:38:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:38:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 21:39:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 21:39:18 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 20 21:47:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:47:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 20 21:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 21:47:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 21:48:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:48:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 21:49:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 21:49:40 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 20 21:57:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 21:57:06 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 20 21:58:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 21:58:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 21:59:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 21:59:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 20 21:59:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 21:59:42 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 20 22:07:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 22:07:54 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 20 22:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 20 22:08:15 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 20 22:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 22:09:42 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 20 22:11:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 22:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 22:18:17 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 20 22:19:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 22:19:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 22:20:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 20 22:20:50 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 20 22:22:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 22:22:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 22:28:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 20 22:28:47 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 22:30:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 20 22:30:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 20 22:30:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 22:30:58 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 20 22:32:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 22:32:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 20 22:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 22:38:51 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 22:40:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 22:40:48 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 20 22:41:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 22:41:03 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 20 22:44:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 22:44:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 22:49:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 22:49:08 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 20 22:50:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 22:50:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 20 22:51:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 22:51:12 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 20 22:55:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 22:55:10 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 20 22:58:20 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 22:58:20 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 22:59:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 22:59:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 20 23:00:39 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 23:00:39 fir-md1-s1 kernel: Lustre: 21418:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 20 23:00:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:00:57 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 20 23:01:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 23:01:12 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 20 23:05:14 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 20 23:07:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 23:07:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 20 23:09:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 23:09:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 20 23:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 20 23:11:21 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 20 23:12:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:12:01 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 20 23:18:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 25cdf7af-ff68-d012-80a1-7216e9113799 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33dfa07800, cur 1563689922 expire 1563689772 last 1563689695 Jul 20 23:18:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 23:19:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 23:19:37 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 20 23:21:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 23:21:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 23:21:40 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 20 23:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b951a81d-f917-6a78-d646-bd2586924893 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252131b800, cur 1563690154 expire 1563690004 last 1563689927 Jul 20 23:22:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 23:22:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:22:35 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 23:29:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 20 23:29:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 20 23:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9a9cb560-5d2d-658c-bbb9-1a3905543934 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b0f64c400, cur 1563690664 expire 1563690514 last 1563690437 Jul 20 23:31:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 20 23:31:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 23:31:57 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 20 23:32:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:32:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 23:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 23:39:44 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 20 23:41:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 20 23:41:58 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 20 23:42:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 23:42:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 20 23:43:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:43:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 20 23:50:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 20 23:50:30 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 20 23:52:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 20 23:52:03 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 20 23:53:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 20 23:53:55 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 20 23:54:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 20 23:58:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:01:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 00:01:01 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 00:02:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 00:02:04 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 00:02:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:04:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.21@o2ib6, removing former export from same NID Jul 21 00:04:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 00:05:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 77bcbfe0-f65a-c3cf-8c6a-1a544e5f6d90 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32360c3c00, cur 1563692703 expire 1563692553 last 1563692476 Jul 21 00:05:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 00:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 00:11:05 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 00:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 00:12:10 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 21 00:15:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:17:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 00:17:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:17:36 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 00:18:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:21:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 00:21:06 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 00:21:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:22:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:22:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 00:22:18 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 21 00:30:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 00:30:26 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 00:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 00:31:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 00:32:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 00:32:21 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 21 00:32:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:32:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 00:40:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 00:40:29 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 00:40:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:40:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 00:42:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 00:42:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 00:42:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 00:42:33 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 21 00:46:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:46:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 00:51:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 00:51:45 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 00:52:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 00:52:37 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 21 00:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 00:52:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 00:58:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 00:58:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 01:02:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 01:02:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 21 01:02:38 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 21 01:02:38 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 21 01:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 01:02:54 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 01:09:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 01:09:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 01:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 01:13:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 01:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 01:13:29 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 01:13:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 01:13:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 01:20:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 01:20:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 01:23:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 01:23:32 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 21 01:23:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 01:23:41 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 01:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 01:24:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 01:34:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 01:34:10 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 01:34:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 01:34:10 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 21 01:34:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 01:34:18 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 01:37:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 01:37:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 01:44:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 01:44:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 01:44:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 01:44:15 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 21 01:45:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 01:45:29 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 01:50:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 01:50:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 01:54:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 01:54:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 01:54:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 01:54:51 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 21 01:55:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 01:55:35 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 21 02:00:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:00:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 02:05:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 02:05:12 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 02:05:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 02:05:12 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 21 02:05:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 02:05:40 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 02:13:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:13:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 02:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 02:15:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 02:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 02:15:57 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 21 02:16:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 02:16:11 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 02:24:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:24:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 02:25:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1961caec-1f2d-6506-8e68-dd5d6e52d748 (at 10.8.11.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501762c00, cur 1563701139 expire 1563700989 last 1563700912 Jul 21 02:25:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 02:26:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 02:26:03 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 21 02:26:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 02:26:16 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 02:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 02:26:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 02:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 02:36:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 02:36:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 02:36:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 02:36:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 02:36:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 02:37:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:37:53 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 02:46:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 02:46:04 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 21 02:46:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 02:46:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 02:48:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:48:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 02:48:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 02:48:51 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 21 02:56:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 02:56:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 21 02:57:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 02:57:03 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 02:59:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 02:59:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 02:59:15 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 02:59:15 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 03:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 03:06:45 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 21 03:07:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 03:07:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 21 03:09:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 03:09:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 03:12:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 03:12:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 03:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 03:17:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 03:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 03:17:18 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 03:19:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 03:19:27 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 03:27:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 03:27:27 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 03:27:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 03:27:27 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 21 03:28:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 03:28:58 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 03:30:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 03:30:19 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 03:37:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 03:37:49 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 03:37:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 03:37:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 03:40:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 03:40:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 03:40:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 03:40:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 03:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 03:48:46 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 03:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 03:48:46 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 21 03:52:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1811832c00, cur 1563706329 expire 1563706179 last 1563706102 Jul 21 03:52:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 03:54:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 03:54:05 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 03:56:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 03:56:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 03:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 03:58:57 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 03:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 03:58:57 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 21 03:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 176e9a67-cee9-6e92-16c4-349227ce732d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28295aa400, cur 1563706798 expire 1563706648 last 1563706571 Jul 21 04:04:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:04:09 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 21 04:08:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 04:08:58 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 04:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 04:09:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 21 04:14:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:14:12 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 04:18:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 04:18:59 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 21 04:19:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 04:19:36 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 04:21:14 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 21 04:23:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:23:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 04:26:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:28:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:28:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 04:29:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 04:29:14 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 04:29:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:29:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 04:29:39 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 04:32:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e1f19e99-f1c0-ec21-54a0-4f16dd23103f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c2b754800, cur 1563708730 expire 1563708580 last 1563708503 Jul 21 04:32:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 04:36:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:38:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:38:34 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 21 04:39:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 04:39:20 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 04:40:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 04:40:08 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 04:47:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:47:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 04:48:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:48:36 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 04:49:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 04:49:27 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 21 04:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 04:50:31 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 04:58:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 04:58:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 04:59:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 04:59:30 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 21 04:59:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 04:59:40 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 05:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 05:01:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 05:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 05:09:50 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 05:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 05:10:05 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 21 05:10:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 05:11:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 05:11:23 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 05:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 22f2e11b-c2b5-1fe6-f988-a0762c79c9d8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e1201d000, cur 1563711202 expire 1563711052 last 1563710975 Jul 21 05:13:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 21 05:18:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c2ed7abf-883d-257b-8a66-f4238e668042 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2235869000, cur 1563711491 expire 1563711341 last 1563711264 Jul 21 05:18:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 05:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 05:19:52 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 21 05:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 05:21:54 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 05:23:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 05:23:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 05:24:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 05:24:05 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 21 05:30:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 05:30:01 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 05:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 05:32:06 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 05:32:37 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 21 05:32:37 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 21 05:32:38 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 21 05:32:38 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 443 previous similar messages Jul 21 05:32:39 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 21 05:32:39 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1351 previous similar messages Jul 21 05:35:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 05:35:46 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 21 05:37:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 05:37:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 05:37:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 980ec186-5922-bd83-fb0a-cdfdb6097f72 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2def205c00, cur 1563712628 expire 1563712478 last 1563712401 Jul 21 05:37:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 05:37:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 980ec186-5922-bd83-fb0a-cdfdb6097f72 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef8485800, cur 1563712634 expire 1563712484 last 1563712407 Jul 21 05:37:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 05:38:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 188 seconds. I think it's dead, and I am evicting it. exp ffff8f2396f73000, cur 1563712704 expire 1563712554 last 1563712516 Jul 21 05:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 05:40:17 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 05:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 05:42:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 05:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32785bc400, cur 1563713149 expire 1563712999 last 1563712922 Jul 21 05:46:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 05:46:49 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 05:47:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0306afbd-8735-8356-d830-3dbc19a3209e (at 10.8.21.21@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f2e20182c00, cur 1563713225 expire 1563713075 last 1563713001 Jul 21 05:47:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 05:47:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 05:50:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 05:50:36 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 21 05:52:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 05:52:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 21 05:57:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 05:57:30 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 05:58:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 05:58:18 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 06:00:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 06:00:56 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 06:02:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 06:02:17 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 06:08:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 06:08:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 06:10:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 06:10:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 06:11:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 06:11:04 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 21 06:12:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 06:12:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 06:20:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 06:20:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 06:20:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 06:20:42 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 21 06:21:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 06:21:08 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 21 06:22:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 06:22:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 06:30:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 06:30:46 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 06:31:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 06:31:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 06:31:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 06:31:15 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 06:32:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 06:32:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 06:40:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 06:40:51 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 21 06:41:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 06:41:21 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 21 06:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 06:42:30 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 06:45:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 06:45:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 06:50:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 06:50:55 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 21 06:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 06:51:51 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 21 06:53:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 06:53:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 06:57:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 06:57:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 07:01:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:01:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 21 07:01:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 07:01:58 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 21 07:03:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 07:03:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 07:10:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 07:10:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 07:11:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:11:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 21 07:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 07:12:03 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 21 07:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 07:13:41 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 07:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3b76f96-c9a2-205f-016e-24cbee9b687b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36573e5400, cur 1563718533 expire 1563718383 last 1563718306 Jul 21 07:15:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 07:20:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 07:20:38 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 07:22:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 07:22:06 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 21 07:22:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:22:25 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 07:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 07:23:48 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 07:31:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 07:31:08 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 07:32:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 07:32:09 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 07:34:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 07:34:01 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 07:34:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:34:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 21 07:42:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 07:42:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 07:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 07:42:24 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 21 07:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 07:44:13 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 07:47:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:47:27 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 07:49:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2943f174-c69d-a4cc-77c3-76c533d583cf (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f214677ec00, cur 1563720597 expire 1563720447 last 1563720370 Jul 21 07:49:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 07:52:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 07:52:28 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 21 07:53:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 07:53:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 07:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 07:54:32 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 07:57:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 07:57:45 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 08:02:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 08:02:32 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 21 08:04:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 08:04:07 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 08:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 08:05:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 08:07:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 08:07:47 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 08:12:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 08:12:46 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 21 08:15:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 08:15:16 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 08:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 08:15:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 08:19:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 08:19:34 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 08:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 08:23:00 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 21 08:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 68ce014a-1ffd-db04-1790-b33735834722 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22e6e4dc00, cur 1563722595 expire 1563722445 last 1563722368 Jul 21 08:23:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 08:25:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 08:25:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 08:26:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 08:26:07 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 08:29:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 08:29:36 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 21 08:33:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 08:33:13 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 08:36:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 08:36:13 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 21 08:38:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 08:38:52 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 08:40:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 08:40:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 08:43:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 08:43:14 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 08:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 08:46:29 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 08:49:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 08:49:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 08:50:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 08:50:57 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 08:53:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 08:53:25 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 21 08:56:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 08:56:50 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 08:57:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5c34f702-13d2-ad9e-15af-5d16e187db67 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250b9b6c00, cur 1563724626 expire 1563724476 last 1563724399 Jul 21 08:57:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 09:01:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 09:01:08 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 21 09:02:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:02:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 09:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 09:03:59 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 21 09:07:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 09:07:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 09:11:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 09:11:12 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 21 09:13:47 fir-md1-s1 kernel: Lustre: 21419:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725619/real 1563725619] req@ffff8f0ed7210600 x1636740566825600/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725626 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 21 09:13:47 fir-md1-s1 kernel: Lustre: 21419:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 21 09:13:54 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725626/real 1563725626] req@ffff8f0c61070600 x1636740566825904/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725633 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:13:54 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 21 09:13:54 fir-md1-s1 kernel: Lustre: 23600:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f133b785700 x1637105976056912/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1563725639 ref 2 fl Interpret:/0/0 rc 0/0 Jul 21 09:13:54 fir-md1-s1 kernel: Lustre: 23600:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Jul 21 09:13:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 09:13:59 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 21 09:14:01 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725633/real 1563725633] req@ffff8f1458e5a700 x1636740566825936/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725640 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:14:01 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 21 09:14:08 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725640/real 1563725640] req@ffff8f0e27dfc500 x1636740566825968/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725647 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:14:15 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725647/real 1563725647] req@ffff8f0e27dfc500 x1636740566825968/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725654 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:14:15 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 21 09:14:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:14:20 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 09:14:29 fir-md1-s1 kernel: Lustre: 21419:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725661/real 1563725661] req@ffff8f0ed7210600 x1636740566825600/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725668 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:14:29 fir-md1-s1 kernel: Lustre: 21419:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 21 09:14:50 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725683/real 1563725683] req@ffff8f1458e5a700 x1636740566825936/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725690 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:14:50 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 21 09:15:25 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563725718/real 1563725718] req@ffff8f0c61070600 x1636740566825904/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563725725 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 09:15:25 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages Jul 21 09:16:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d90a8beb-82e7-caad-48d3-bfb049d10082 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f344436d000, cur 1563725772 expire 1563725622 last 1563725545 Jul 21 09:16:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 09:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 161 seconds. I think it's dead, and I am evicting it. exp ffff8f2cbba81c00, cur 1563725848 expire 1563725698 last 1563725687 Jul 21 09:17:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 09:17:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 09:17:30 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 09:21:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de1e41000, cur 1563726110 expire 1563725960 last 1563725883 Jul 21 09:22:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 09:22:10 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 21 09:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 09:24:07 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 21 09:24:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:24:34 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 09:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 09:27:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 09:32:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 09:32:50 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 09:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 09:34:24 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 21 09:37:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 09:37:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 09:37:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:37:54 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 09:44:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 09:44:36 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 09:45:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 09:45:32 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 21 09:48:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:48:08 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 21 09:48:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 09:48:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 09:52:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2fe2faab-141d-7ec5-ae00-edc989e212ad (at 10.8.21.21@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff8f1de11ba400, cur 1563727976 expire 1563727826 last 1563727790 Jul 21 09:53:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b599a322-153b-fae7-ccb2-5d4953f22d1d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ad13fb800, cur 1563728017 expire 1563727867 last 1563727790 Jul 21 09:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 09:54:36 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 21 09:57:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 09:57:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 09:58:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 09:58:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 09:58:37 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 09:58:37 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 10:04:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 10:04:37 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 10:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 10:08:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 10:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 10:08:51 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 10:10:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 10:10:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 10:14:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 10:14:43 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 21 10:19:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 10:19:03 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 10:20:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 10:20:53 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 10:21:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 10:21:25 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 21 10:23:13 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563729786/real 1563729786] req@ffff8f08332d1e00 x1636740617363488/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563729793 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 21 10:23:13 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 25 previous similar messages Jul 21 10:23:21 fir-md1-s1 kernel: Lustre: 23701:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f08373a9500 x1637106321061344/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:26/0 lens 480/568 e 1 to 0 dl 1563729806 ref 2 fl Interpret:/0/0 rc 0/0 Jul 21 10:23:21 fir-md1-s1 kernel: Lustre: 23701:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 21 10:23:27 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563729800/real 1563729800] req@ffff8f08333c8f00 x1636740617363520/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563729807 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:23:27 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 21 10:23:48 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563729821/real 1563729821] req@ffff8f08332d1e00 x1636740617363488/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563729828 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:23:48 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 21 10:24:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 683c8f07-788f-0343-2cd4-3268cf51f0af (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32b2378400, cur 1563729850 expire 1563729700 last 1563729623 Jul 21 10:24:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 10:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 288954f3-c9bb-c33d-59df-cdced6ed55e1 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a26013000, cur 1563729853 expire 1563729703 last 1563729626 Jul 21 10:24:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 10:24:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jul 21 10:24:45 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 21 10:29:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 10:29:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 10:31:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 10:31:26 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 10:33:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 10:33:07 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 21 10:34:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 10:34:59 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 21 10:39:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 10:39:56 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 10:42:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 10:42:33 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 10:43:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 10:43:20 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 10:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 10:45:43 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 21 10:50:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 10:50:09 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 10:53:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 10:53:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 10:54:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 10:54:18 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 21 10:55:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 10:55:48 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 10:57:31 fir-md1-s1 kernel: Lustre: 23600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563731844/real 1563731844] req@ffff8f0dcaabe900 x1636740630366848/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563731851 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 21 10:57:31 fir-md1-s1 kernel: Lustre: 23600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Jul 21 10:57:38 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563731851/real 1563731851] req@ffff8f0c7def8000 x1636740630366912/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563731858 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:57:38 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 21 10:57:39 fir-md1-s1 kernel: Lustre: 23576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0d3bc01b00 x1637106453649968/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:14/0 lens 480/568 e 1 to 0 dl 1563731864 ref 2 fl Interpret:/0/0 rc 0/0 Jul 21 10:57:39 fir-md1-s1 kernel: Lustre: 23576:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 21 10:57:52 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563731865/real 1563731865] req@ffff8f0e0985b900 x1636740630366864/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563731872 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:57:52 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 21 10:58:13 fir-md1-s1 kernel: Lustre: 20571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563731886/real 1563731886] req@ffff8f1406ee6000 x1636740630366944/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563731893 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:58:13 fir-md1-s1 kernel: Lustre: 20571:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Jul 21 10:58:48 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563731921/real 1563731921] req@ffff8f0e0985d100 x1636740630366992/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563731928 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 10:58:48 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages Jul 21 10:59:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0246ca8d-62ca-c658-c003-5b12605b8fa1 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c6796c000, cur 1563731942 expire 1563731792 last 1563731715 Jul 21 10:59:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 87b813f3-9146-c920-0e5a-0f4d57df34e7 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f218e948800, cur 1563731951 expire 1563731801 last 1563731724 Jul 21 10:59:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 11:00:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 11:00:50 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 11:03:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 11:03:30 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 11:05:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 21 11:05:47 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 11:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 11:05:54 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 21 11:11:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 11:11:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 11:13:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 11:13:41 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 21 11:15:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 11:15:53 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 21 11:16:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 11:16:02 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 21 11:18:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19f4be4400, cur 1563733118 expire 1563732968 last 1563732891 Jul 21 11:18:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 11:21:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 11:21:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 11:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b8b301bb-82bd-1e38-79fe-0318d575e771 (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1deee66400, cur 1563733439 expire 1563733289 last 1563733212 Jul 21 11:26:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 11:26:00 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 11:26:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 11:26:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 21 11:26:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 11:26:07 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 21 11:31:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client eecd36bd-f7c6-fdf3-3e5b-8b6917a9190a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f08146e9c00, cur 1563733863 expire 1563733713 last 1563733636 Jul 21 11:31:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 11:31:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5e3e3f30-5198-7946-065a-57f3aea1627c (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520216800, cur 1563733866 expire 1563733716 last 1563733639 Jul 21 11:31:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 11:31:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 11:31:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 11:33:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9acaccea-6a03-fb7d-8f87-7d8730439726 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c59181800, cur 1563734022 expire 1563733872 last 1563733795 Jul 21 11:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 11:36:20 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 21 11:36:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 11:36:36 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 11:37:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 11:37:58 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 21 11:41:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 11:41:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 11:46:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 11:46:31 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 11:47:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 11:47:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 11:49:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 11:49:19 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 11:51:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 11:51:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 11:56:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 11:56:33 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 21 11:57:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 11:57:49 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 12:01:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:01:22 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 21 12:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 12:01:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 12:06:29 fir-md1-s1 kernel: Lustre: 10505:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563735982/real 1563735982] req@ffff8f13d72ffb00 x1636740661606816/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563735989 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 21 12:06:29 fir-md1-s1 kernel: Lustre: 10505:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages Jul 21 12:06:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 12:06:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 12:06:37 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 21 12:06:37 fir-md1-s1 kernel: Lustre: 23706:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f13000d8300 x1637106791736720/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1563736002 ref 2 fl Interpret:/0/0 rc 0/0 Jul 21 12:06:37 fir-md1-s1 kernel: Lustre: 23706:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 21 12:06:43 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563735996/real 1563735996] req@ffff8f078786e900 x1636740661606992/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563736003 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 12:06:43 fir-md1-s1 kernel: Lustre: 10197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 21 12:07:04 fir-md1-s1 kernel: Lustre: 10505:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563736017/real 1563736017] req@ffff8f13d72ffb00 x1636740661606816/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563736024 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 12:07:04 fir-md1-s1 kernel: Lustre: 10505:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 21 12:07:19 fir-md1-s1 kernel: LustreError: 10197:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.21.21@o2ib6) returned error from glimpse AST (req@ffff8f078786e900 x1636740661606992 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f17536172c0/0x5d9ee67d4504da39 lrc: 4/0,0 mode: PW/PW res: [0x2c002c5b9:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.21.21@o2ib6 remote: 0xbf919a4c79606677 expref: 43 pid: 20731 timeout: 0 lvb_type: 0 Jul 21 12:07:19 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.21.21@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Jul 21 12:07:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 358s: evicting client at 10.8.21.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f17536172c0/0x5d9ee67d4504da39 lrc: 4/0,0 mode: PW/PW res: [0x2c002c5b9:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.21.21@o2ib6 remote: 0xbf919a4c79606677 expref: 44 pid: 20731 timeout: 0 lvb_type: 0 Jul 21 12:07:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Jul 21 12:08:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 77e797ed-c007-5e1f-e1b5-17b84e4fe29b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bfc665400, cur 1563736093 expire 1563735943 last 1563735866 Jul 21 12:08:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 12:09:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 12:09:13 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 21 12:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 12:12:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 12:14:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:14:05 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 12:16:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 12:16:39 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 21 12:21:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 12:21:52 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 12:22:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 12:22:49 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 12:24:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:24:44 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 12:26:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 12:26:57 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 12:28:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0ba42df000, cur 1563737309 expire 1563737159 last 1563737082 Jul 21 12:28:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 12:32:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 12:32:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 12:33:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 12:33:07 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 21 12:35:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:35:20 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 21 12:36:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 12:36:57 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 21 12:41:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d963e4a1-366c-6440-7a92-404c8b92d558 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17025c2800, cur 1563738068 expire 1563737918 last 1563737841 Jul 21 12:43:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 12:43:23 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 12:43:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 12:43:49 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 12:46:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:46:12 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 21 12:47:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 12:47:03 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 12:53:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 12:53:29 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 12:55:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 12:55:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 12:56:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 12:56:29 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 12:57:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 12:57:13 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 21 13:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 79a9d156-2bbe-f163-d7f4-b7240dc4511f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae25b7c00, cur 1563739293 expire 1563739143 last 1563739066 Jul 21 13:01:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 13:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 13:03:33 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 13:06:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07f931dc-ecd6-b3fa-e21e-623a303b3580 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f204eec3000, cur 1563739567 expire 1563739417 last 1563739340 Jul 21 13:06:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 13:06:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 13:06:07 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 21 13:07:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 13:07:18 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 21 13:07:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 13:07:19 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 21 13:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 13:14:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 13:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d49f24ce-8fd3-6923-ec73-501f231bf63c (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f285c49d400, cur 1563740131 expire 1563739981 last 1563739904 Jul 21 13:15:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 13:16:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 13:16:15 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 21 13:17:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 13:17:19 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 21 13:21:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 13:21:47 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 13:24:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 13:24:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 13:26:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 13:26:19 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 21 13:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 13:27:40 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 21 13:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c0a89e800, cur 1563740934 expire 1563740784 last 1563740707 Jul 21 13:28:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 13:32:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 13:32:28 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 13:34:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 13:34:56 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 21 13:38:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 13:38:01 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 21 13:39:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 13:39:24 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 21 13:42:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 13:42:46 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 13:45:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 13:45:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 21 13:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 13:48:02 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 13:49:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 13:49:47 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 13:50:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1aff52a8-2b78-bfe5-a3bd-99a46dad527e (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cf1bcc400, cur 1563742204 expire 1563742054 last 1563741977 Jul 21 13:52:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 13:52:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 13:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 13:55:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 13:58:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 13:58:04 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 21 14:01:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 14:01:20 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 21 14:02:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec1e9e400, cur 1563742946 expire 1563742796 last 1563742719 Jul 21 14:02:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 14:04:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 14:04:50 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 14:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 14:05:27 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 14:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7063255b-96a7-c074-9cd7-45fc50e8d513 (at 10.8.21.21@o2ib6) in 215 seconds. I think it's dead, and I am evicting it. exp ffff8f329c4ba000, cur 1563743132 expire 1563742982 last 1563742917 Jul 21 14:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7063255b-96a7-c074-9cd7-45fc50e8d513 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f329c4b9800, cur 1563743144 expire 1563742994 last 1563742917 Jul 21 14:08:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:08:05 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 14:12:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 14:12:12 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 14:15:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 14:15:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 14:15:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 14:15:52 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 14:18:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:18:10 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 21 14:22:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 14:22:13 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 21 14:26:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 14:26:03 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 14:28:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:28:11 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 21 14:33:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 14:33:14 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 14:34:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 14:34:06 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 14:36:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 14:36:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 14:38:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:38:38 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 14:43:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 14:43:34 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 21 14:46:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 14:46:17 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 14:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 14:46:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 14:48:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:48:39 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 21 14:55:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 14:55:32 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 14:56:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 14:56:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 14:58:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 14:58:50 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 14:59:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 14:59:27 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 15:05:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 15:05:54 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 15:06:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 15:06:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 15:08:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 15:08:54 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 21 15:09:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 15:09:54 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 15:16:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 15:16:12 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 15:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 15:16:49 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 21 15:18:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 15:18:59 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 21 15:20:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 15:20:27 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 21 15:27:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 15:27:09 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 15:27:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 15:27:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 21 15:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 15:29:13 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 15:33:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 15:33:08 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 15:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 15:37:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 15:38:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 15:38:34 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 21 15:39:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 15:39:23 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 21 15:47:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 15:47:48 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 15:48:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 15:48:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 15:48:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 15:48:59 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 21 15:49:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 15:49:24 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 21 15:58:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 15:58:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 15:59:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 15:59:00 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 21 15:59:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 15:59:25 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 21 16:00:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 16:00:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 16:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 16:08:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 16:09:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 16:09:29 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 16:09:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 21 16:09:56 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 16:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 16:18:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 16:18:18 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 16:19:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 16:19:39 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 16:20:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 16:20:12 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 16:22:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 16:22:54 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 16:28:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 16:28:24 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 16:29:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 16:29:51 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 21 16:31:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 16:31:23 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 16:36:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 16:36:46 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 16:38:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 16:38:29 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 16:40:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 16:40:19 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 21 16:41:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 16:41:27 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 16:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32438a9000, cur 1563752760 expire 1563752610 last 1563752533 Jul 21 16:46:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 16:48:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 16:48:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 16:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ca16d7400, cur 1563753003 expire 1563752853 last 1563752776 Jul 21 16:50:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 16:50:35 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 16:52:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 16:52:24 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 21 16:52:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 16:52:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 16:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 16:58:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 17:00:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 17:00:37 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 21 17:03:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 17:03:39 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 17:08:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 17:08:40 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 17:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 17:08:54 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 17:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 17:10:53 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 21 17:13:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ca9ec04-0c27-0123-c1e9-913bb5afc354 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16e7323800, cur 1563754413 expire 1563754263 last 1563754186 Jul 21 17:13:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 17:13:45 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 21 17:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 17:19:08 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 17:19:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 17:19:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 17:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 17:20:53 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 21 17:24:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 17:24:22 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 21 17:29:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 17:29:14 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 17:30:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 17:30:54 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 21 17:31:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 17:31:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 17:36:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b1121cc00, cur 1563755812 expire 1563755662 last 1563755585 Jul 21 17:36:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 17:37:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 17:37:41 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 17:39:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 17:39:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 17:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 17:41:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 17:41:11 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 17:41:11 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 21 17:47:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 17:47:42 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 17:49:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 17:49:29 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 17:51:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 17:51:16 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 17:51:18 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563756671/real 1563756671] req@ffff8f0ea12bbf00 x1636740865774944/t0(0) o106->fir-MDT0000@10.8.29.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563756678 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 21 17:51:18 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 21 17:51:25 fir-md1-s1 kernel: Lustre: 20458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563756678/real 1563756678] req@ffff8f0ea12bbf00 x1636740865774944/t0(0) o106->fir-MDT0000@10.8.29.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563756685 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 21 17:51:26 fir-md1-s1 kernel: Lustre: 10589:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f11b4cf0000 x1637108306925248/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:1/0 lens 480/568 e 1 to 0 dl 1563756691 ref 2 fl Interpret:/0/0 rc 0/0 Jul 21 17:51:26 fir-md1-s1 kernel: Lustre: 10589:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 21 17:51:32 fir-md1-s1 kernel: Lustre: 20458:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f11b4cf0000 x1637108306925248/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:1/0 lens 480/536 e 1 to 0 dl 1563756691 ref 1 fl Complete:/0/0 rc 301/301 Jul 21 17:51:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.29.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 17:51:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 21 17:58:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 17:58:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 21 17:58:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd405916-db19-f360-5fcb-55570ae1ded5 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19697d1800, cur 1563757135 expire 1563756985 last 1563756908 Jul 21 17:59:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 17:59:57 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 18:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 18:01:31 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 21 18:04:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 18:04:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 18:09:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 18:09:47 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 18:10:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 18:10:02 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 18:11:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 18:11:34 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 21 18:16:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 18:16:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 18:20:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 18:20:07 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 21 18:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 18:20:34 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 18:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 18:21:44 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 21 18:29:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 18:30:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 18:30:13 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 18:30:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 18:30:52 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 21 18:31:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 18:31:50 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 21 18:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 757d5366-1bd8-8e3a-adb5-44b24defcb60 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17992d4000, cur 1563759594 expire 1563759444 last 1563759367 Jul 21 18:39:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 18:40:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 18:40:29 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 18:40:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 18:40:57 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 18:41:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 18:41:51 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 21 18:44:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 18:44:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 18:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bc082040-6d59-e17c-03ca-5929096a6008 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1812310800, cur 1563759943 expire 1563759793 last 1563759716 Jul 21 18:45:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 18:50:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 18:50:34 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 21 18:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 18:51:23 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 21 18:51:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 18:51:55 fir-md1-s1 kernel: Lustre: Skipped 136 previous similar messages Jul 21 18:52:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e17e3dc4-b6f5-509a-a026-ebdda0b68853 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f398cfef400, cur 1563760356 expire 1563760206 last 1563760129 Jul 21 18:52:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 18:56:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 18:56:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 19:01:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 19:01:09 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 19:01:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 19:01:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 19:02:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 19:02:00 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 21 19:11:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 19:11:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 19:11:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 19:11:39 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 21 19:12:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 19:12:06 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 21 19:19:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 19:19:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 19:21:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 19:21:19 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 21 19:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 19:21:55 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 19:22:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 19:22:13 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 21 19:25:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 19:25:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 19:31:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 19:31:55 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 19:32:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 19:32:14 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 21 19:32:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 19:32:26 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 21 19:34:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ad5ea837-9de8-2243-1056-1f955f2fb736 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d24b74c00, cur 1563762856 expire 1563762706 last 1563762629 Jul 21 19:34:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 19:41:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 19:41:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 19:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 19:42:01 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 21 19:42:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 19:42:15 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 21 19:42:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 19:42:31 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 21 19:44:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 48826004-1730-eec2-e54d-6519cf05f1e4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bfb520400, cur 1563763470 expire 1563763320 last 1563763243 Jul 21 19:44:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 19:49:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 19:49:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1fa32219-218d-9dfd-331d-e2320a965eba (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521aec400, cur 1563763761 expire 1563763611 last 1563763534 Jul 21 19:49:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 19:52:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 19:52:09 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 19:52:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 19:52:18 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 21 19:52:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 19:52:36 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 21 19:56:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2373438800, cur 1563764177 expire 1563764027 last 1563763950 Jul 21 19:56:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 20:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 20:02:29 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 20:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 20:02:29 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 21 20:02:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 20:02:53 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 21 20:05:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:12:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 20:12:53 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 20:12:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 20:12:53 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 21 20:13:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 20:13:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 20:17:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:22:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 20:22:58 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 21 20:23:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 20:23:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 21 20:23:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 20:23:44 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 21 20:30:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:32:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:33:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 20:33:13 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 21 20:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 20:33:14 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 20:35:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 20:35:31 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 21 20:38:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:40:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:40:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:42:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 20:43:22 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 20:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 20:43:22 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 21 20:45:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:46:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 20:46:11 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 20:46:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 20:53:27 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 21 20:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 21 20:53:27 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 21 20:53:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 20:56:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 20:56:12 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 21 20:58:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2429e67000, cur 1563767888 expire 1563767738 last 1563767661 Jul 21 21:00:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 21:00:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 21:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 21:03:45 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 21:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 21:03:45 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 21 21:06:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 21:06:20 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 21 21:08:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 21:13:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 21:13:53 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 21 21:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 21:13:54 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 21:17:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 21:17:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 21:22:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28372f2000, cur 1563769335 expire 1563769185 last 1563769108 Jul 21 21:23:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 21:23:56 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 21 21:24:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 21 21:24:17 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 21:26:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 21:26:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 21:28:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 21:28:14 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 21:34:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 21:34:12 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 21 21:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 21:34:45 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 21:38:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 21:38:21 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 21:39:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 21:39:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 21:44:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 21:44:20 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 21 21:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 21:44:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 21:48:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 21:48:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 21:51:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 21:51:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 21 21:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 21:54:36 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 21 21:54:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 21:54:52 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 21 21:58:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 21 21:58:51 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 21 22:02:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 22:02:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 22:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 22:04:51 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 21 22:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 22:05:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 22:08:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b7c9b8e1-5df9-522e-5650-56f3bc11c51e (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f2c029c00, cur 1563772101 expire 1563771951 last 1563771874 Jul 21 22:10:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 21 22:10:59 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 21 22:14:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 22:14:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 22:14:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 22:14:57 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 21 22:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 22:15:14 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 21 22:22:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 22:22:05 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 22:25:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 22:25:13 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 21 22:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 22:25:18 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 22:27:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 22:27:10 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 22:32:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 22:32:20 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 21 22:35:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 22:35:19 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 21 22:35:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 22:35:33 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 21 22:36:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7a1dde60-7396-cd6a-91da-fc42c3921a42 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f12ddfc0800, cur 1563773799 expire 1563773649 last 1563773572 Jul 21 22:36:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 22:37:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 22:37:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 22:43:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 22:43:11 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 21 22:45:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 21 22:45:19 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 21 22:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 21 22:46:01 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 21 22:47:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 22:47:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 22:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6ae9084d-f931-4dde-3e70-9236fdb25775 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f282d3d1800, cur 1563774754 expire 1563774604 last 1563774527 Jul 21 22:52:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 22:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ae9084d-f931-4dde-3e70-9236fdb25775 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4537851000, cur 1563774771 expire 1563774621 last 1563774544 Jul 21 22:52:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 21 22:53:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 22:53:18 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 21 22:55:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 658a1ab7-d1a8-618b-47d9-b1878781a97a (at 10.8.14.4@o2ib6) Jul 21 22:55:27 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 21 22:56:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 22:56:24 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 21 23:01:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 23:01:51 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 21 23:03:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 23:03:22 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 21 23:05:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 21 23:05:33 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 21 23:06:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 21 23:06:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 23:13:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 23:13:25 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 21 23:14:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 23:14:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 21 23:15:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 23:15:48 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 21 23:17:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 23:17:11 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 21 23:23:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 23:23:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 21 23:25:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 23:25:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 21 23:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 23:25:51 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 21 23:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f476c104-56d1-47c5-a709-8699802881a4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c05c3e800, cur 1563776794 expire 1563776644 last 1563776567 Jul 21 23:27:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 23:27:13 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 21 23:35:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 23:35:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 21 23:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 23:35:53 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 21 23:35:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 23:35:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 21 23:37:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 21 23:37:25 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 21 23:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2edd4b0000, cur 1563777874 expire 1563777724 last 1563777647 Jul 21 23:44:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 21 23:45:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 21 23:45:50 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 21 23:45:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 21 23:45:59 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 21 23:47:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 23:47:39 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 21 23:50:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e5ef8d400, cur 1563778207 expire 1563778057 last 1563777980 Jul 21 23:52:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 21 23:52:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 21 23:55:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 21 23:55:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 21 23:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 21 23:56:21 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 21 23:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 21 23:57:43 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 21 23:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e36805a8-58bf-4b76-db3d-d2c31ccfc81a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ad1363400, cur 1563778741 expire 1563778591 last 1563778514 Jul 22 00:02:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:02:52 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 00:06:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 236a2248-9ebd-fd5c-af3c-9f1bc6c23695 (at 10.9.113.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc82c800, cur 1563779171 expire 1563779021 last 1563778944 Jul 22 00:06:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 00:06:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 00:06:43 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 22 00:06:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 00:06:51 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 00:07:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 00:07:46 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 00:14:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:14:28 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 00:16:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 00:16:57 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 22 00:16:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 00:16:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 00:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 00:17:47 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 00:18:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 78758270-5e18-d341-7ccf-c6c63eb0a1cf (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f398378e400, cur 1563779884 expire 1563779734 last 1563779657 Jul 22 00:18:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 00:26:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 00:26:57 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 22 00:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 00:28:18 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 22 00:29:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 00:29:53 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 00:30:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 88e7a994-7aa8-eeb3-12c3-00c5936c7264 (at 10.8.23.14@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff8f41329c3800, cur 1563780615 expire 1563780465 last 1563780398 Jul 22 00:30:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 00:30:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0460a896-b313-cdc8-8ac0-edfd59c25072 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a8aac1c00, cur 1563780625 expire 1563780475 last 1563780398 Jul 22 00:36:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:36:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 00:37:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 00:37:12 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 00:38:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 00:38:28 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 00:39:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:40:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 00:40:24 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 22 00:43:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:43:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 00:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 00:47:24 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 22 00:48:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 00:48:41 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 00:50:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 00:50:14 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 00:51:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 00:51:26 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 00:57:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 00:57:33 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 00:59:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 00:59:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 01:01:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:01:30 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 22 01:01:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 22 01:01:37 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 22 01:07:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 01:07:41 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 22 01:07:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client bdd52bee-060a-853d-0426-5773ec43d244 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27d0a5c800, cur 1563782879 expire 1563782729 last 1563782652 Jul 22 01:07:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 01:09:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 01:09:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 01:10:50 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783043/real 1563783043] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783050 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 01:10:57 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783050/real 1563783050] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783057 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:10:58 fir-md1-s1 kernel: Lustre: 23561:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e3de6bf00 x1637109429366880/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:3/0 lens 480/568 e 1 to 0 dl 1563783063 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 01:11:04 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783057/real 1563783057] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783064 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:11:11 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783064/real 1563783064] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783071 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:11:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6397f75c-d067-d870-a10a-85a4ad3a3017 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24cd516000, cur 1563783076 expire 1563782926 last 1563782849 Jul 22 01:11:16 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 22 01:11:25 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783078/real 1563783078] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783085 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:11:25 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 22 01:11:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 01:11:43 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 22 01:11:46 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783099/real 1563783099] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783106 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:11:46 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 22 01:12:21 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783134/real 1563783134] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783141 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:12:21 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 22 01:12:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:12:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 01:13:31 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563783204/real 1563783204] req@ffff8f0e8ac0b900 x1636741119700992/t0(0) o106->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563783211 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 01:13:31 fir-md1-s1 kernel: Lustre: 20738:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 22 01:14:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 95c23571-6ded-28b5-8b2e-63d85e709c23 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1feea9e000, cur 1563783241 expire 1563783091 last 1563783014 Jul 22 01:14:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 01:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 01:17:43 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 01:19:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 01:19:20 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 22 01:21:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 01:21:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 01:22:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:22:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 01:24:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8ed0bd48-60ce-b6dd-2d16-2def53d74fc4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f331de07400, cur 1563783857 expire 1563783707 last 1563783630 Jul 22 01:24:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 01:28:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 01:28:10 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 22 01:29:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 01:29:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 01:31:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 01:31:54 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 22 01:33:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:33:20 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 01:38:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 01:38:33 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 22 01:40:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 01:40:47 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 01:42:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 01:42:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 22 01:43:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:43:47 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 22 01:48:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 01:48:37 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 01:51:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 01:51:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 01:53:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 01:53:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 22 01:54:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 01:54:58 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 01:58:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 01:58:43 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 22 02:01:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 02:01:09 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 02:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bde5e4ec-989d-ae73-3eca-399c2b85d190 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39a52a1800, cur 1563786261 expire 1563786111 last 1563786034 Jul 22 02:04:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 02:04:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:04:46 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 22 02:06:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 02:06:23 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 02:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 02:09:34 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 02:09:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0a2a66f800, cur 1563786597 expire 1563786447 last 1563786370 Jul 22 02:09:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 02:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 02:11:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 02:14:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:14:52 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 02:20:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 02:20:00 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 02:20:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 02:20:01 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 22 02:21:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 02:21:45 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 02:26:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:26:05 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 02:30:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 02:30:04 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 22 02:30:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 02:30:52 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 02:32:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 02:32:01 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 02:36:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:36:27 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 22 02:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 02:40:17 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 22 02:42:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 02:42:17 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 02:43:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 02:43:24 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 22 02:46:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 76e0eaeb-2112-a1fd-2e36-de6dd0abe460 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2619f41c00, cur 1563788781 expire 1563788631 last 1563788554 Jul 22 02:46:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:46:27 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 22 02:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 02:50:31 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 22 02:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 02:53:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 02:54:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 02:54:06 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 22 02:56:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 02:56:28 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 22 03:00:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 03:00:34 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 22 03:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7d52bd92-6094-5f44-dc9e-a1cb12244b83 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f8ccdf400, cur 1563789667 expire 1563789517 last 1563789440 Jul 22 03:01:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 03:03:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 03:03:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 03:05:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 03:05:05 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 22 03:06:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 03:06:53 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 03:10:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 03:10:43 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 22 03:13:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 03:13:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 03:16:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 03:16:22 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 03:17:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 03:17:17 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 22 03:20:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 03:20:50 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 22 03:24:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 03:24:54 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 22 03:26:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 03:26:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 03:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 03:27:19 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 22 03:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 03:31:20 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 22 03:35:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 03:35:04 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 03:36:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a544e17c-6d6b-4cd0-9cba-5ffa115eb4f5 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2371f1ec00, cur 1563791792 expire 1563791642 last 1563791565 Jul 22 03:36:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 03:36:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d82ce290-bdbf-3301-ba93-8ca0218961ba (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26b8e08c00, cur 1563791794 expire 1563791644 last 1563791567 Jul 22 03:36:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 03:36:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 03:36:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 03:38:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 03:38:44 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 03:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 03:41:57 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 22 03:45:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 03:45:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 22 03:48:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 03:48:46 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 03:50:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 47609cdf-662e-204f-0786-34eb51f8ce6b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c9eece000, cur 1563792653 expire 1563792503 last 1563792426 Jul 22 03:51:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 03:51:29 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 22 03:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 03:51:59 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 03:52:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0a907780-08b3-7d25-985f-2055ad214223 (at 10.8.23.14@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8f2d9f054800, cur 1563792729 expire 1563792579 last 1563792526 Jul 22 03:52:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 03:52:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5a9f6ca1-1d3d-5763-6970-1c5b675211e9 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4501194c00, cur 1563792753 expire 1563792603 last 1563792526 Jul 22 03:52:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 03:56:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 03:56:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 04:00:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 04:00:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 04:01:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 04:01:36 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 04:01:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 04:01:59 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 22 04:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 04:06:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 04:09:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fc77916b-3eb3-1d72-b3b1-e89a1c151f1b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b3a3000, cur 1563793763 expire 1563793613 last 1563793536 Jul 22 04:12:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 04:12:00 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 04:13:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 04:13:38 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 04:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 04:16:15 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 04:17:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 04:17:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 04:19:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bef3cec00, cur 1563794381 expire 1563794231 last 1563794154 Jul 22 04:19:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 04:22:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 04:22:04 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 22 04:24:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 04:24:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 22 04:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 04:26:16 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 22 04:30:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e3fe7c00, cur 1563795028 expire 1563794878 last 1563794801 Jul 22 04:31:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 04:31:20 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 04:32:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 04:32:15 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 04:35:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 04:35:10 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 22 04:36:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 04:36:51 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 22 04:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9ed6d838-824f-536a-182a-e7c88465178f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fd3c35000, cur 1563795625 expire 1563795475 last 1563795398 Jul 22 04:42:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 04:42:53 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 22 04:43:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 04:43:29 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 04:46:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 04:46:24 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 04:46:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 04:46:59 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 04:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 04:52:54 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 22 04:53:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 04:53:54 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 04:57:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 04:57:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 04:57:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 04:57:37 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 22 05:02:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 05:02:54 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 22 05:04:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 05:04:15 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 22 05:05:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6114d04e-3809-cfab-2d7f-a0c4c0932243 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a5cfd4400, cur 1563797159 expire 1563797009 last 1563796932 Jul 22 05:05:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 05:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 05:07:55 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 22 05:09:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 05:09:58 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 05:12:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 05:12:59 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 22 05:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f1336f0a-2e89-b3a4-2363-d7972f5cf609 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f346b945400, cur 1563797681 expire 1563797531 last 1563797454 Jul 22 05:14:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 05:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dd2bb4f7-180e-c3f5-cc85-1556da159a54 (at 10.8.23.14@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff8f30a0ee7000, cur 1563797757 expire 1563797607 last 1563797540 Jul 22 05:15:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 05:18:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 05:18:07 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 05:19:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 05:19:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 05:20:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 05:20:00 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 05:23:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 05:23:06 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 22 05:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 05:28:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 05:31:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 05:31:30 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 22 05:31:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 05:31:35 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 22 05:32:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6e16f53b-1962-70ee-edf9-82bc827b1f9a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28bafa1c00, cur 1563798745 expire 1563798595 last 1563798518 Jul 22 05:32:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 05:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 05:33:19 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 05:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 05:38:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 05:41:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 05:41:31 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 05:42:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 05:42:50 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 22 05:43:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 05:43:19 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 22 05:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 05:48:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 05:49:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f5a64c48-71fc-1cc4-14f0-16870e0ae739 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0659092c00, cur 1563799760 expire 1563799610 last 1563799533 Jul 22 05:49:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 05:49:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 038f6ab4-29f6-3b65-0165-bb3d02909450 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f288f83dc00, cur 1563799780 expire 1563799630 last 1563799553 Jul 22 05:49:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 05:52:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 05:52:28 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 22 05:53:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 05:53:19 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 22 05:56:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75362ccf-8d5b-5d15-4863-09304db1bc92 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f333cd0cc00, cur 1563800172 expire 1563800022 last 1563799945 Jul 22 05:58:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 05:58:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 06:01:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:01:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 06:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 06:03:25 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 22 06:03:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 06:03:38 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 22 06:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 06:08:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 06:12:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:12:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 06:13:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 06:13:32 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 22 06:16:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 06:16:11 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 22 06:19:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 06:19:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 06:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 06:23:41 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 22 06:24:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:24:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 06:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 06:26:14 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 22 06:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 06:29:35 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 06:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4df09f47-f374-d84b-8a81-ff813c691a2c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae2366000, cur 1563802407 expire 1563802257 last 1563802180 Jul 22 06:33:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 06:34:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 06:34:00 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 06:34:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:34:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 06:36:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 06:36:23 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 22 06:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 06:39:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 06:42:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bced00bd-ab54-6d3c-481c-222b6c054c3d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f192caa2000, cur 1563802954 expire 1563802804 last 1563802727 Jul 22 06:42:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 06:42:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bced00bd-ab54-6d3c-481c-222b6c054c3d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f192caa2800, cur 1563802975 expire 1563802825 last 1563802748 Jul 22 06:42:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 06:44:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 06:44:03 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 06:44:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:44:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 06:47:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 06:47:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 22 06:51:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 06:51:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 06:54:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 06:54:18 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 22 06:57:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 06:57:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 06:59:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 06:59:45 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 07:01:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 07:01:46 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 07:04:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 07:04:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 22 07:10:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 07:10:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 07:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 07:11:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 22 07:12:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 07:12:27 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 07:14:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 07:14:55 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 07:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6008763e-8b99-7142-5e83-395b185755e9 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bd3bf0400, cur 1563804922 expire 1563804772 last 1563804695 Jul 22 07:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 07:22:39 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 07:23:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 07:23:42 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 07:24:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 07:24:57 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 22 07:29:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 07:29:21 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 07:33:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 07:33:31 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 07:34:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 07:34:37 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 07:34:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 07:34:57 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 22 07:41:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 07:41:00 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 07:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 07:43:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 07:44:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 07:44:57 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 22 07:44:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 07:44:57 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 22 07:53:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 07:53:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 07:55:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 07:55:04 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 07:55:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 07:55:33 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 22 07:56:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 07:56:25 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 22 07:57:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f281629fc00, cur 1563807428 expire 1563807278 last 1563807201 Jul 22 07:57:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 08:00:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e65a6904-be5d-e834-42ea-2c951e3983a3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3330335000, cur 1563807614 expire 1563807464 last 1563807387 Jul 22 08:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 08:04:06 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 08:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 73073a19-e259-0ae4-a4c9-dc8c57df0e18 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e4405c00, cur 1563807850 expire 1563807700 last 1563807623 Jul 22 08:04:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 08:04:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 73073a19-e259-0ae4-a4c9-dc8c57df0e18 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e4407400, cur 1563807854 expire 1563807704 last 1563807627 Jul 22 08:04:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 08:05:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:05:10 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 22 08:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 08:05:35 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 22 08:07:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 08:07:20 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 22 08:14:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9c1f911c-3f22-7e20-b668-817788c06348 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4530bbcc00, cur 1563808447 expire 1563808297 last 1563808220 Jul 22 08:14:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 08:14:12 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 08:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 08:15:35 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 22 08:16:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:16:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 08:19:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 08:19:42 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 22 08:25:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 08:25:16 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 08:25:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 08:25:42 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 08:26:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:26:34 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 08:30:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 08:30:05 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 22 08:35:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 08:35:35 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 08:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 08:35:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 22 08:37:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:37:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 08:38:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fde894400, cur 1563809909 expire 1563809759 last 1563809682 Jul 22 08:38:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 08:41:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 08:41:51 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 08:43:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e1c1e2000, cur 1563810222 expire 1563810072 last 1563809995 Jul 22 08:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 08:46:01 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 22 08:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 08:46:01 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 22 08:48:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:48:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 08:51:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 08:51:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 08:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 08:56:06 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 22 08:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 08:56:06 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 22 08:59:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 08:59:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 09:02:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 09:02:01 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 22 09:06:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 09:06:37 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 09:06:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 09:06:37 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Jul 22 09:07:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8e31cdf4-7d08-ee7b-161f-7d1cc445e92e (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3510643400, cur 1563811624 expire 1563811474 last 1563811397 Jul 22 09:10:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 09:10:40 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 09:12:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 09:12:08 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 22 09:16:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 09:16:40 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 22 09:16:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a2481ca2-5954-b8ed-554d-d1a8a675793f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0a61df1000, cur 1563812202 expire 1563812052 last 1563811975 Jul 22 09:16:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 09:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5f220284-ae9b-de67-0f41-b4219b455c90 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f82a4cc00, cur 1563812209 expire 1563812059 last 1563811982 Jul 22 09:16:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 09:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 09:17:22 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 09:22:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 09:22:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 22 09:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 09:26:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 09:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 09:27:40 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 09:28:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 09:28:46 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 09:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 09:32:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 22 09:36:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 09:36:44 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 09:38:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 09:38:46 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 09:42:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 09:42:43 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 22 09:46:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 09:46:44 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 22 09:49:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 09:49:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 09:49:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 09:49:02 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 09:51:02 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 22 09:51:02 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Jul 22 09:52:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 09:55:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 09:55:02 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 22 09:56:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 09:56:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 09:56:52 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 22 09:59:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 09:59:15 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 10:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0df79b6c-8d00-d8d8-218a-77e1e070b5f1 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0fc13bd000, cur 1563815028 expire 1563814878 last 1563814801 Jul 22 10:04:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 10:04:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 10:06:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 10:06:22 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 10:06:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ecb4cd0a-22be-83d1-1912-7cac8317b9ee (at 10.8.18.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f397c3bc400, cur 1563815195 expire 1563815045 last 1563814968 Jul 22 10:06:35 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 22 10:06:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 10:06:53 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 10:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 10:09:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 10:16:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 10:16:30 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 10:16:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 10:16:57 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 22 10:19:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 10:19:59 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 10:21:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 10:21:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 10:26:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 10:26:32 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 10:27:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 10:27:04 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 22 10:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 10:31:03 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 10:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9dc885e5-28c9-8635-a9e9-ca3469b12d9d (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e93ca9c00, cur 1563816997 expire 1563816847 last 1563816770 Jul 22 10:36:37 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 22 10:36:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 10:36:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 10:36:37 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 10:36:37 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 10:37:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 10:37:08 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 22 10:40:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a8b96bd-c186-5568-c26c-3a58ee40a8dc (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f170f3cd800, cur 1563817239 expire 1563817089 last 1563817012 Jul 22 10:40:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 10:41:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 10:41:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 10:45:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 87161ce4-0274-a426-1642-aee65588bf59 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32fac70400, cur 1563817526 expire 1563817376 last 1563817299 Jul 22 10:45:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 10:47:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 10:47:42 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 10:48:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 10:48:07 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 10:49:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 10:49:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 10:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 10:51:26 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 22 10:57:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 10:57:47 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 22 10:59:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 10:59:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 10:59:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 10:59:21 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 22 11:01:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 11:01:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 11:07:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 11:07:50 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 11:09:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 11:09:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 11:09:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 11:09:32 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 11:10:51 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 11:10:51 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 382 previous similar messages Jul 22 11:10:58 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 11:10:58 fir-md1-s1 kernel: Lustre: 21312:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jul 22 11:11:29 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 11:11:29 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 11:12:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 11:12:04 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 11:18:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 11:18:24 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 22 11:20:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 11:20:23 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 22 11:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 11:22:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 11:22:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 11:22:33 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 22 11:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 11:28:56 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 22 11:30:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 11:30:43 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 11:32:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 11:32:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 11:32:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f328a5d8c00, cur 1563820363 expire 1563820213 last 1563820136 Jul 22 11:32:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 11:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 170 seconds. I think it's dead, and I am evicting it. exp ffff8f1f53031c00, cur 1563820439 expire 1563820289 last 1563820269 Jul 22 11:35:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 11:35:47 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 22 11:38:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 11:38:59 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 22 11:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 11:42:43 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 22 11:45:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 11:45:50 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 11:49:13 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 11:49:13 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jul 22 11:49:17 fir-md1-s1 kernel: Lustre: 23644:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 11:49:17 fir-md1-s1 kernel: Lustre: 23644:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 11:49:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 11:49:20 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 22 11:52:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 11:52:51 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 11:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 11:53:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 11:56:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 11:56:19 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 11:59:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 11:59:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 11:59:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 11:59:31 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 22 12:02:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 12:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 12:04:03 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 12:07:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 12:07:33 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 22 12:09:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 12:09:36 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 12:12:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 12:12:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 12:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 12:14:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 12:14:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b041b9f7-2fef-a002-564a-d4216355a2a2 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f264742ac00, cur 1563822855 expire 1563822705 last 1563822628 Jul 22 12:19:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 12:19:30 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 22 12:19:31 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:19:31 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 12:19:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 12:19:40 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 12:20:52 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:20:52 fir-md1-s1 kernel: Lustre: 23565:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 12:23:45 fir-md1-s1 kernel: Lustre: 23572:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:24:03 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:24:03 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jul 22 12:24:07 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:24:07 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 813 previous similar messages Jul 22 12:24:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 12:24:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 12:24:42 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:24:42 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 362 previous similar messages Jul 22 12:25:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 12:25:27 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 12:26:42 fir-md1-s1 kernel: Lustre: 23688:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 12:29:41 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 22 12:31:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 22 12:31:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 12:32:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ceb59791-07bb-5c68-6ef2-ef73c72c60f9 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f72772400, cur 1563823922 expire 1563823772 last 1563823695 Jul 22 12:32:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 12:33:56 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 12:35:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 12:35:00 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 12:37:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 12:37:03 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 12:39:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 12:39:47 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 22 12:42:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 12:42:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 12:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 12:45:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 12:49:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 12:49:50 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 22 12:52:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 12:52:22 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 22 12:55:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 12:55:14 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 12:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 12:55:42 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 13:00:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 13:00:14 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 22 13:03:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 13:03:07 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 22 13:05:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 13:05:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 13:08:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 13:08:58 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 13:10:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 13:10:17 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 13:13:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 13:13:51 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 22 13:16:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 13:16:11 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 13:20:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 13:20:54 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 22 13:21:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 13:21:32 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 13:24:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 13:24:16 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 13:26:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 13:26:31 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 13:30:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 13:30:56 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 13:32:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 13:32:40 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 13:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f49dd8c00, cur 1563827607 expire 1563827457 last 1563827380 Jul 22 13:33:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 13:34:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 13:34:22 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 13:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 13:36:37 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 13:41:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 13:41:08 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 22 13:44:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 13:44:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 13:46:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 13:46:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 22 13:48:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 13:48:48 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 22 13:51:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 13:51:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 13:53:32 fir-md1-s1 kernel: Lustre: 23688:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 13:55:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 13:55:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 13:56:13 fir-md1-s1 kernel: Lustre: 23644:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 13:56:13 fir-md1-s1 kernel: Lustre: 23644:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 13:57:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 13:57:17 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 13:58:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 13:58:49 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 22 13:59:20 fir-md1-s1 kernel: Lustre: 23560:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 22 14:01:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 14:01:23 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 22 14:07:08 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 22 14:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 14:07:28 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 14:08:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:08:53 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 14:10:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 14:10:38 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 14:11:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 14:11:29 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 22 14:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a61cbd1a-7731-7cf0-59ad-cf23e24c5463 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d4d588c00, cur 1563830209 expire 1563830059 last 1563829982 Jul 22 14:17:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 14:17:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 14:18:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:18:56 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 22 14:21:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 14:21:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 14:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 14:21:44 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 22 14:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7c219a89-16f8-7655-64c6-d7a3efc0b952 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25e193f000, cur 1563830704 expire 1563830554 last 1563830477 Jul 22 14:25:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 14:25:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7c219a89-16f8-7655-64c6-d7a3efc0b952 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2658037800, cur 1563830708 expire 1563830558 last 1563830481 Jul 22 14:25:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 14:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 14:28:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 14:28:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:28:57 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 14:31:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3dd418d5-f332-799f-3cf9-2baa97a15a38 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f135d008400, cur 1563831097 expire 1563830947 last 1563830870 Jul 22 14:31:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 14:31:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 14:31:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 14:31:51 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 22 14:38:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 14:38:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 14:39:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:39:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 14:42:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 14:42:18 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 22 14:42:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 50fc183c-d476-dddc-04cf-b547730d7d93 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28abc5e800, cur 1563831742 expire 1563831592 last 1563831515 Jul 22 14:42:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 14:45:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 14:45:28 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 14:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 14:48:16 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 22 14:49:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:49:09 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 22 14:52:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 829567cb-7d9d-a611-475e-a662f88c39aa (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32615c4c00, cur 1563832337 expire 1563832187 last 1563832110 Jul 22 14:52:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 14:52:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 14:52:24 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 22 14:58:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 14:58:14 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 22 14:58:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 14:58:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 14:59:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 14:59:15 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 22 15:02:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 15:02:44 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 22 15:06:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 285701a9-adfd-4231-dcc5-618319af0733 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26bbf7e800, cur 1563833216 expire 1563833066 last 1563832989 Jul 22 15:06:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 15:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 15:09:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 15:09:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 15:09:34 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 22 15:11:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ce07a50a-0d5d-5fd6-e604-415b195e0852 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30cd491400, cur 1563833494 expire 1563833344 last 1563833267 Jul 22 15:11:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 15:12:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 15:12:44 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 15:15:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 15:15:06 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 15:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7e7c23de-b0f5-1e1b-905d-5f4c441c803a (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2632c41400, cur 1563833972 expire 1563833822 last 1563833745 Jul 22 15:19:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 15:19:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 15:19:34 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 15:19:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 15:19:55 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 22 15:23:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 15:23:02 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 15:26:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 15:29:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 15:29:53 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 15:30:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 15:30:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 15:33:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 15:33:05 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 15:36:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 15:36:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 15:40:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 15:40:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 15:41:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 15:41:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 15:43:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 15:43:07 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 22 15:48:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 15:48:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 15:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 15:50:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 15:51:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 15:51:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 15:53:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 15:53:08 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 22 15:57:58 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 16:00:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 16:00:22 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 16:00:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 504141ee-0c2c-97a6-183a-02c2bd8a6b39 (at 10.8.18.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f326cbeb800, cur 1563836424 expire 1563836274 last 1563836197 Jul 22 16:00:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 16:02:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:02:02 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 16:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 16:03:11 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 22 16:04:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 16:04:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 16:06:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8bdeb34b-1463-e572-e0ea-aa14c9b9e68b (at 10.8.27.27@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522b43800, cur 1563836794 expire 1563836644 last 1563836567 Jul 22 16:06:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 22 16:06:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client eb063a87-3fc2-413e-e5a9-3ea270493202 (at 10.8.27.27@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520311000, cur 1563836802 expire 1563836652 last 1563836575 Jul 22 16:10:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 16:10:33 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 16:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 16:13:16 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 22 16:13:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:13:31 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 16:14:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 16:14:22 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 16:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 16:20:38 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 16:23:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 16:23:27 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 22 16:23:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:23:55 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 16:30:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 16:30:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 16:33:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 16:33:29 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 22 16:34:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:34:00 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 16:36:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 16:36:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 16:40:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 16:40:49 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 16:43:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 16:43:33 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 22 16:45:20 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563839113/real 1563839113] req@ffff8f090fb56c00 x1636741770637536/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563839120 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 16:45:20 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 22 16:45:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:45:26 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 22 16:45:28 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1430ebe300 x1637114054788112/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:3/0 lens 480/568 e 1 to 0 dl 1563839133 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 16:45:41 fir-md1-s1 kernel: Lustre: 25678:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563839134/real 1563839134] req@ffff8f1072934500 x1636741770637552/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563839141 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 16:45:41 fir-md1-s1 kernel: Lustre: 25678:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 22 16:46:16 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563839169/real 1563839169] req@ffff8f090fb56c00 x1636741770637536/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563839176 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 22 16:46:16 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 22 16:46:32 fir-md1-s1 kernel: LustreError: 27319:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.9@o2ib6) returned error from glimpse AST (req@ffff8f090fb56c00 x1636741770637536 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f348aa93a80/0x5d9ee67efdf4ddb6 lrc: 4/0,0 mode: PW/PW res: [0x200029e31:0x3f:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.9.9@o2ib6 remote: 0xeb47bac23bcbd994 expref: 282 pid: 23759 timeout: 0 lvb_type: 0 Jul 22 16:46:32 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.9@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Jul 22 16:46:32 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 379s: evicting client at 10.8.9.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2dcaf12ac0/0x5d9ee67efdf4f15f lrc: 4/0,0 mode: PW/PW res: [0x200021720:0x5f26:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.9.9@o2ib6 remote: 0xeb47bac23bcbd9b0 expref: 283 pid: 23597 timeout: 0 lvb_type: 0 Jul 22 16:46:32 fir-md1-s1 kernel: LustreError: 27319:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Jul 22 16:46:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 28fe2b86-0fea-f879-1428-0301d8d43c05 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a77c2bc00, cur 1563839206 expire 1563839056 last 1563838979 Jul 22 16:46:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 16:49:28 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 22 16:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 16:50:51 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 16:53:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 16:53:36 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 22 16:55:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f88d9959-9808-be23-7ee6-0289275f2164 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3db1ef1000, cur 1563839705 expire 1563839555 last 1563839478 Jul 22 16:55:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 16:56:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 16:56:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 17:00:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 17:00:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 17:03:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 17:03:37 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 22 17:04:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:04:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 17:06:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 17:06:21 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 17:07:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:08:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 17:10:56 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 17:13:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:13:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 17:13:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 17:13:46 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 22 17:17:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:17:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 17:17:27 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 22 17:18:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7ed1caa0-94bc-b05e-6c4f-937e860eea13 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42fc3d2c00, cur 1563841093 expire 1563840943 last 1563840866 Jul 22 17:18:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 17:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3023cf14-8d75-798f-7dac-cf19c1731d20 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f29a676ec00, cur 1563841101 expire 1563840951 last 1563840874 Jul 22 17:18:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 22 17:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 17:21:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 17:23:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:23:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 17:23:47 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 17:27:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 17:27:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 22 17:28:18 fir-md1-s1 kernel: Lustre: 23591:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 22 17:28:18 fir-md1-s1 kernel: Lustre: 23591:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 22 17:28:22 fir-md1-s1 kernel: Lustre: 23609:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 22 17:28:22 fir-md1-s1 kernel: Lustre: 23609:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4187 previous similar messages Jul 22 17:28:43 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 17:28:47 fir-md1-s1 kernel: LustreError: 46511:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f26a2173c50 x1639524950072848/t0(0) o4->2f37c11e-3125-1e87-c1d9-aa856d778112@10.8.17.7@o2ib6:2/0 lens 488/448 e 1 to 0 dl 1563841742 ref 1 fl Interpret:/0/0 rc 0/0 Jul 22 17:28:47 fir-md1-s1 kernel: LustreError: 46511:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 22 17:28:49 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 17:28:49 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1333947e00 Jul 22 17:28:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 2f37c11e-3125-1e87-c1d9-aa856d778112 (at 10.8.17.7@o2ib6), client will retry: rc = -110 Jul 22 17:28:49 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 22 17:28:50 fir-md1-s1 kernel: Lustre: 23739:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563841723/real 0] req@ffff8f33c97c6c00 x1636741813055344/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563841730 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 17:28:50 fir-md1-s1 kernel: Lustre: 23739:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 22 17:28:53 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 17:28:53 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ca6615600 Jul 22 17:28:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 83b4afa2-a367-a71c-8602-481ad43297ce (at 10.8.0.68@o2ib6), client will retry: rc -110 Jul 22 17:28:53 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f7f931800 Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2018af7600 Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2add4e6c00 Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eecc30e00 Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f207fb8e800 Jul 22 17:28:54 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325bd48400 Jul 22 17:29:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:31:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 17:31:12 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 22 17:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 17:35:01 fir-md1-s1 kernel: Lustre: Skipped 196 previous similar messages Jul 22 17:37:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e25ee1c00, cur 1563842259 expire 1563842109 last 1563842032 Jul 22 17:38:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 17:38:06 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 22 17:41:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 17:41:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 17:41:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f208e011000, cur 1563842503 expire 1563842353 last 1563842276 Jul 22 17:45:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 17:45:06 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 17:48:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 17:48:11 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 22 17:51:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 17:51:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 17:55:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 17:55:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 17:55:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 17:55:26 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 22 17:58:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 17:58:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 22 18:01:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 18:01:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:01:25 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 18:05:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 18:05:30 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 18:07:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:07:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 18:08:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 18:08:47 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 18:11:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 18:11:39 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 18:15:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 18:15:43 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 18:18:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 18:18:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 18:20:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 18:21:43 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 22 18:25:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 18:25:56 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 22 18:28:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 18:28:52 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 22 18:30:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:30:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 18:31:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 18:31:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 22 18:36:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 18:36:22 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 22 18:40:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 18:40:16 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 22 18:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 18:41:58 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 18:42:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:42:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 18:46:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 18:46:25 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 22 18:50:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 18:50:16 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 18:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 18:52:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 18:53:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 18:53:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 18:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 18:56:25 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 22 19:00:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 19:00:28 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 22 19:03:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 19:03:03 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 19:03:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 19:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 19:06:45 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 19:11:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 19:11:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 22 19:13:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 19:13:14 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 22 19:16:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f317b970000, cur 1563848166 expire 1563848016 last 1563847939 Jul 22 19:16:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 19:16:50 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 22 19:16:54 fir-md1-s1 kernel: Lustre: 21461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18acfee600 x1634318243770464/t0(0) o101->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1563848219 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 19:16:54 fir-md1-s1 kernel: Lustre: 21461:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 22 19:17:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1d74580900/0x5d9ee67f1894d82d lrc: 3/0,0 mode: PR/PR res: [0x200029c10:0x11d3:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a5322d522aee expref: 31 pid: 24578 timeout: 2963288 lvb_type: 0 Jul 22 19:17:08 fir-md1-s1 kernel: LustreError: 24578:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1614ace400 ns: mdt-fir-MDT0000_UUID lock: ffff8f1d74583a80/0x5d9ee67f1894e6dc lrc: 3/0,0 mode: PW/PW res: [0x200029c10:0x11d3:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a5322d522b11 expref: 10 pid: 24578 timeout: 0 lvb_type: 0 Jul 22 19:17:08 fir-md1-s1 kernel: LustreError: 24578:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 22 19:17:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 19:17:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 19:17:24 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a54e28000 x1634318243776336/t0(0) o101->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1563848249 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 19:21:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 19:21:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 19:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 19:23:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 19:26:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 19:26:51 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 22 19:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 19:33:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 19:33:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 19:33:53 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 19:35:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 19:35:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 19:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 19:37:04 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 19:43:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 19:43:28 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 19:43:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 19:43:59 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 19:47:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 19:47:05 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 19:48:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 19:48:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 19:53:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 19:53:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 19:54:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 19:54:00 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 22 19:57:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 19:57:07 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 22 20:01:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 20:01:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 20:03:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 20:03:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 22 20:04:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 20:04:22 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 22 20:07:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 20:07:18 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 20:13:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 20:13:44 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 22 20:15:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 20:15:34 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 20:16:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 20:16:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 20:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e8f00a400, cur 1563851820 expire 1563851670 last 1563851593 Jul 22 20:17:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 20:17:19 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 22 20:24:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 20:24:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 20:26:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 22 20:26:11 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 22 20:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 20:27:28 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 22 20:28:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 20:28:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 20:34:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 20:34:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 20:36:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 20:36:12 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 22 20:37:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 20:37:34 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 22 20:43:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 20:43:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 20:45:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 20:45:06 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 22 20:46:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 20:46:43 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 20:47:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 20:47:53 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 22 20:53:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 20:53:21 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 20:55:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 20:55:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Jul 22 20:55:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 22 20:55:26 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 20:55:35 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854128/real 0] req@ffff8f18a8271e00 x1636741922747984/t0(0) o106->fir-MDT0000@10.8.29.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854135 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:55:35 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 22 20:55:42 fir-md1-s1 kernel: Lustre: 23734:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854131/real 0] req@ffff8f2b527ead00 x1636741922753248/t0(0) o106->fir-MDT0002@10.8.18.1@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854142 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:55:43 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1dc6219b00 x1634121880650416/t0(0) o101->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:18/0 lens 480/568 e 1 to 0 dl 1563854148 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:55:45 fir-md1-s1 kernel: Lustre: 21305:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f33d131e300 x1639323908318576/t0(0) o103->7d0688b3-8792-1306-7035-fa281876a9e0@10.8.1.32@o2ib6:20/0 lens 328/224 e 1 to 0 dl 1563854150 ref 2 fl Interpret:H/0/0 rc 0/0 Jul 22 20:55:48 fir-md1-s1 kernel: LustreError: 24564:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3482e0b850 x1638942944870704/t0(0) o4->4701937a-133f-fe89-5c21-0011f1e41f68@10.8.13.9@o2ib6:25/0 lens 504/448 e 1 to 0 dl 1563854155 ref 1 fl Interpret:/0/0 rc 0/0 Jul 22 20:55:48 fir-md1-s1 kernel: LustreError: 24564:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 9 previous similar messages Jul 22 20:55:49 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854142/real 0] req@ffff8f2660232d00 x1636741922760368/t0(0) o104->fir-MDT0002@10.8.1.25@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563854149 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:55:50 fir-md1-s1 kernel: Lustre: 46557:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3482e0b850 x1638942944870704/t0(0) o4->4701937a-133f-fe89-5c21-0011f1e41f68@10.8.13.9@o2ib6:25/0 lens 504/448 e 1 to 0 dl 1563854155 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:55:50 fir-md1-s1 kernel: Lustre: 46557:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 22 20:55:53 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 20:55:53 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32caf6c400 Jul 22 20:55:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4701937a-133f-fe89-5c21-0011f1e41f68 (at 10.8.13.9@o2ib6), client will retry: rc = -110 Jul 22 20:55:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 22 20:55:56 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f1dc6219b00 x1634121880650416/t0(0) o101->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:18/0 lens 480/536 e 1 to 0 dl 1563854148 ref 1 fl Complete:/0/0 rc 301/301 Jul 22 20:55:56 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3239170600 x1637984023543120/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:1/0 lens 480/568 e 0 to 0 dl 1563854161 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:56:00 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f281fb8a700 x1637984023549728/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:5/0 lens 480/568 e 0 to 0 dl 1563854165 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:56:00 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 22 20:56:11 fir-md1-s1 kernel: Lustre: 21366:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:21s); client may timeout. req@ffff8f33d131e300 x1639323908318576/t0(0) o103->7d0688b3-8792-1306-7035-fa281876a9e0@10.8.1.32@o2ib6:20/0 lens 328/192 e 1 to 0 dl 1563854150 ref 1 fl Complete:H/0/0 rc 0/0 Jul 22 20:56:11 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.1.25@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f7ce6f740/0x5d9ee67f229587a8 lrc: 4/0,0 mode: PR/PR res: [0x2c002be2a:0x98d5:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.1.25@o2ib6 remote: 0x934e1d0b2e8eb5d5 expref: 41 pid: 24578 timeout: 2969231 lvb_type: 0 Jul 22 20:56:13 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854161/real 0] req@ffff8f240b7c3900 x1636741922772880/t0(0) o106->fir-MDT0002@10.8.18.1@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854173 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:56:13 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 22 20:56:17 fir-md1-s1 kernel: Lustre: 25083:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:26s); client may timeout. req@ffff8f2504d4b300 x1639421564063552/t0(0) o103->43c47423-a225-1e44-717a-5288b8e7b7db@10.8.8.37@o2ib6:21/0 lens 328/192 e 1 to 0 dl 1563854151 ref 1 fl Complete:H/0/0 rc 0/0 Jul 22 20:56:19 fir-md1-s1 kernel: LustreError: 23740:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2502d6b800 ns: mdt-fir-MDT0002_UUID lock: ffff8f34f7e11680/0x5d9ee67f22db128b lrc: 1/0,0 mode: EX/EX res: [0x2c002be2a:0x98d5:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x54801000000000 nid: 10.8.1.25@o2ib6 remote: 0x934e1d0b2e8eb5e3 expref: 3 pid: 23740 timeout: 0 lvb_type: 3 Jul 22 20:56:19 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:7s); client may timeout. req@ffff8f29f54f3000 x1631722997352320/t354352162673(0) o101->05c8b6b2-04ac-c002-5530-092914937d78@10.8.1.25@o2ib6:12/0 lens 376/1568 e 0 to 0 dl 1563854172 ref 1 fl Complete:/0/0 rc -107/-107 Jul 22 20:56:22 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 20:56:22 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f178cc03a00 Jul 22 20:56:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with aa7e6242-eb29-61e5-c533-531def72a1b7 (at 10.8.17.16@o2ib6), client will retry: rc = -110 Jul 22 20:56:22 fir-md1-s1 kernel: Lustre: 23591:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854175/real 0] req@ffff8f1123ff6000 x1636741922779136/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854182 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:56:28 fir-md1-s1 kernel: Lustre: 21461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-7), not sending early reply req@ffff8f3359551200 x1637984023612640/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:3/0 lens 480/568 e 0 to 0 dl 1563854193 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:56:28 fir-md1-s1 kernel: Lustre: 21461:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 22 20:56:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.27.25@o2ib6, removing former export from same NID Jul 22 20:56:43 fir-md1-s1 kernel: Lustre: Skipped 1293 previous similar messages Jul 22 20:56:54 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 20:56:59 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854212/real 0] req@ffff8f0fb7a43c00 x1636741922818192/t0(0) o106->fir-MDT0000@10.8.29.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854219 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:56:59 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 22 20:57:00 fir-md1-s1 kernel: Lustre: 23628:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-15), not sending early reply req@ffff8f329337f800 x1637984023649296/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:4/0 lens 480/568 e 0 to 0 dl 1563854224 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:57:00 fir-md1-s1 kernel: Lustre: 23628:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 22 20:57:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ed86c244-737e-68dd-b8d9-63be8b21af77 (at 10.8.20.15@o2ib6) Jul 22 20:57:53 fir-md1-s1 kernel: Lustre: Skipped 6276 previous similar messages Jul 22 20:57:55 fir-md1-s1 kernel: LNetError: 55544:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.17.12@o2ib6 from 10.0.10.51@o2ib7 Jul 22 20:58:28 fir-md1-s1 kernel: Lustre: 23630:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854282/real 0] req@ffff8f36bef09e00 x1636741922887456/t0(0) o106->fir-MDT0002@10.8.18.1@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1563854308 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 20:58:28 fir-md1-s1 kernel: Lustre: 23630:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 22 20:58:32 fir-md1-s1 kernel: Lustre: 25677:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-10), not sending early reply req@ffff8f37f9a41800 x1637984023812592/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1563854317 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 20:58:32 fir-md1-s1 kernel: Lustre: 25677:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 22 20:59:21 fir-md1-s1 kernel: LNet: Service thread pid 22279 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 22 20:59:21 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 22 20:59:21 fir-md1-s1 kernel: Pid: 22279, comm: mdt01_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 22 20:59:21 fir-md1-s1 kernel: Call Trace: Jul 22 20:59:21 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 22 20:59:21 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 22 20:59:21 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 22 20:59:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 22 20:59:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 22 20:59:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 22 20:59:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 22 20:59:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 22 20:59:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854361.22279 Jul 22 20:59:36 fir-md1-s1 kernel: LNet: Service thread pid 23591 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 22 20:59:36 fir-md1-s1 kernel: Pid: 23591, comm: mdt00_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 22 20:59:36 fir-md1-s1 kernel: Call Trace: Jul 22 20:59:36 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 22 20:59:36 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 22 20:59:36 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 22 20:59:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 22 20:59:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 22 20:59:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 22 20:59:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 22 20:59:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 22 20:59:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854376.23591 Jul 22 20:59:37 fir-md1-s1 kernel: Pid: 23709, comm: mdt03_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 22 20:59:37 fir-md1-s1 kernel: Call Trace: Jul 22 20:59:37 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 22 20:59:37 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 22 20:59:37 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 22 20:59:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 22 20:59:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 22 20:59:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 22 20:59:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 22 20:59:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 22 20:59:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854377.23709 Jul 22 20:59:46 fir-md1-s1 kernel: LNet: Service thread pid 23637 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 22 20:59:46 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 22 20:59:46 fir-md1-s1 kernel: Pid: 23637, comm: mdt03_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 22 20:59:46 fir-md1-s1 kernel: Call Trace: Jul 22 20:59:46 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 22 20:59:46 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 22 20:59:46 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 22 20:59:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 22 20:59:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 22 20:59:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 22 20:59:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 22 20:59:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 22 20:59:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854386.23637 Jul 22 20:59:47 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 21:00:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 05c8b6b2-04ac-c002-5530-092914937d78 (at 10.8.1.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2428095800, cur 1563854403 expire 1563854253 last 1563854176 Jul 22 21:00:05 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1563854398/real 0] req@ffff8f09b4b6f200 x1636741922989520/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563854405 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 22 21:00:10 fir-md1-s1 kernel: LNet: Service thread pid 22279 completed after 248.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 22 21:00:12 fir-md1-s1 kernel: LNet: Service thread pid 23679 was inactive for 200.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 22 21:00:12 fir-md1-s1 kernel: Pid: 23679, comm: mdt02_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 22 21:00:12 fir-md1-s1 kernel: Call Trace: Jul 22 21:00:12 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Jul 22 21:00:12 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Jul 22 21:00:12 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Jul 22 21:00:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 22 21:00:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 22 21:00:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 22 21:00:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 22 21:00:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 22 21:00:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854412.23679 Jul 22 21:00:13 fir-md1-s1 kernel: LNet: Service thread pid 21410 was inactive for 200.79s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 22 21:00:15 fir-md1-s1 kernel: LNet: Service thread pid 97640 was inactive for 200.13s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 22 21:00:15 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 22 21:00:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854415.97640 Jul 22 21:00:21 fir-md1-s1 kernel: LNet: Service thread pid 20541 was inactive for 200.56s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jul 22 21:00:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854421.20541 Jul 22 21:00:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1563854422.23711 Jul 22 21:00:23 fir-md1-s1 kernel: Lustre: 23555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0ecf46dd00 x1635197616517472/t0(0) o101->04874f63-dfd7-2a1b-9b5b-da39adcf93d5@10.9.109.42@o2ib4:28/0 lens 1768/3288 e 0 to 0 dl 1563854428 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 21:00:23 fir-md1-s1 kernel: Lustre: 23555:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 22 21:00:27 fir-md1-s1 kernel: LNet: Service thread pid 23591 completed after 252.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 22 21:00:27 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.65@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0bfb112d00/0x5d9ee67f1bc0cc41 lrc: 4/0,0 mode: PR/PR res: [0x2c002c2b4:0x12a83:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.65@o2ib6 remote: 0xe44886eb344b40da expref: 2641 pid: 23739 timeout: 2969487 lvb_type: 0 Jul 22 21:00:27 fir-md1-s1 kernel: LustreError: 20378:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f062627d400 x1636741923278432/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 22 21:00:27 fir-md1-s1 kernel: LustreError: 20378:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 7 previous similar messages Jul 22 21:00:56 fir-md1-s1 kernel: LustreError: 55491:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2191fa2450 x1636673440431136/t0(0) o256->327c2a50-dba2-1c9c-0f3d-801872275c5c@10.8.18.26@o2ib6:8/0 lens 304/240 e 1 to 0 dl 1563854468 ref 1 fl Interpret:/0/0 rc 0/0 Jul 22 21:00:56 fir-md1-s1 kernel: LustreError: 55491:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 22 21:00:57 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 22 21:00:57 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e43dcee00 Jul 22 21:00:57 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e102dba00 Jul 22 21:00:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eae4f7c00 Jul 22 21:00:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350c6ce400 Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28ad8d4600 Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f26909e0400 Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 55542:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3f61e82450 x1638708200913328/t0(0) o256->a9534d35-abb5-3045-b46f-82a5a3c25826@10.8.17.20@o2ib6:10/0 lens 304/240 e 1 to 0 dl 1563854470 ref 1 fl Interpret:/0/0 rc 0/0 Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 55542:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2949b09600 Jul 22 21:01:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24f4973200 Jul 22 21:01:01 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f332ef23400 Jul 22 21:01:01 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38a7b83200 Jul 22 21:01:01 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39ea65ba00 Jul 22 21:01:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 166 seconds. I think it's dead, and I am evicting it. exp ffff8f369d224800, cur 1563854490 expire 1563854340 last 1563854324 Jul 22 21:03:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 21:03:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 21:03:34 fir-md1-s1 kernel: Lustre: 20724:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f246cb77800 x1634318949244096/t0(0) o101->a6b91a43-6f67-a7e7-0e97-a87e8033e0cf@10.8.9.10@o2ib6:9/0 lens 480/568 e 0 to 0 dl 1563854619 ref 2 fl Interpret:/0/0 rc 0/0 Jul 22 21:04:39 fir-md1-s1 kernel: LustreError: 24578:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563854589, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1589a81f80/0x5d9ee67f2350fe74 lrc: 3/0,1 mode: --/PW res: [0x200029c11:0xf1:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24578 timeout: 0 lvb_type: 0 Jul 22 21:04:39 fir-md1-s1 kernel: LustreError: 24578:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Jul 22 21:05:01 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563854611, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f300c6669c0/0x5d9ee67f2358a0f5 lrc: 3/0,1 mode: --/PW res: [0x200029c10:0x11d4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23704 timeout: 0 lvb_type: 0 Jul 22 21:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 21:05:27 fir-md1-s1 kernel: Lustre: Skipped 8452 previous similar messages Jul 22 21:05:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f34fd910d80/0x5d9ee67f2343f7bf lrc: 3/0,0 mode: PR/PR res: [0x200029c11:0xf1:0x0].0x0 bits 0x5b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a532372e1ae7 expref: 83 pid: 23622 timeout: 2969798 lvb_type: 0 Jul 22 21:05:38 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2520390400 ns: mdt-fir-MDT0000_UUID lock: ffff8f300c6669c0/0x5d9ee67f2358a0f5 lrc: 3/0,0 mode: PW/PW res: [0x200029c10:0x11d4:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x50200000000000 nid: 10.8.9.10@o2ib6 remote: 0x9ed6a532372e2bf7 expref: 82 pid: 23704 timeout: 0 lvb_type: 0 Jul 22 21:05:38 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 22 21:07:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 21:07:35 fir-md1-s1 kernel: Lustre: Skipped 3095 previous similar messages Jul 22 21:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 22 21:08:12 fir-md1-s1 kernel: Lustre: Skipped 6617 previous similar messages Jul 22 21:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a6b91a43-6f67-a7e7-0e97-a87e8033e0cf (at 10.8.9.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d9b58dc00, cur 1563854965 expire 1563854815 last 1563854738 Jul 22 21:13:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 21:13:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 21:15:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 21:15:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 21:17:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 21:17:41 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 22 21:18:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 21:18:32 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 22 21:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 21:25:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 22 21:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 21:28:41 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 22 21:28:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 21:28:48 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 22 21:30:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 21:30:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 21:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 21:35:46 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 21:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 22 21:39:04 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 22 21:39:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 21:39:59 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 22 21:44:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 21:44:19 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 22 21:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 21:46:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 22 21:49:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 21:49:06 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 22 21:51:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 22 21:51:09 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 22 21:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26985dc800, cur 1563857605 expire 1563857455 last 1563857378 Jul 22 21:54:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 21:54:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 22 21:56:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 21:56:22 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 22 21:59:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 21:59:23 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 22 22:01:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 22 22:01:43 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 22 22:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 22:06:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 22:09:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 22:09:40 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 22 22:11:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 22:11:49 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 22:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef6ccc800, cur 1563858807 expire 1563858657 last 1563858580 Jul 22 22:14:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 22:14:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 22:16:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 22:16:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 22 22:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2286904c00, cur 1563859147 expire 1563858997 last 1563858920 Jul 22 22:19:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 22:19:53 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 22 22:21:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 22:21:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 22 22:24:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 22:24:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 22:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 22:26:46 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 22 22:30:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 22:30:01 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 22 22:31:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 22:31:55 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 22 22:36:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 22:36:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 22 22:38:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 22:38:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 22 22:40:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 22:40:06 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 22 22:42:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 22:42:01 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 22:48:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 22:48:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 22:50:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 22 22:50:06 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 22 22:50:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 22:50:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 22:52:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 22:52:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 22 22:58:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 22 22:58:17 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 22 23:00:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 23:00:10 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 22 23:01:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 23:02:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:02:22 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 22 23:08:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 23:08:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 22 23:10:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 23:10:16 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 22 23:12:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 23:12:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 23:12:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:12:28 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 22 23:18:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 23:18:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 22 23:20:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 23:20:18 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 22 23:22:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:22:29 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 22 23:23:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 23:23:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 23:28:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 22 23:28:43 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 22 23:30:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 22 23:30:53 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 22 23:32:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:32:41 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 23:38:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 23:38:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 22 23:38:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 22 23:38:59 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 22 23:41:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 23:41:01 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 22 23:43:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:43:51 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 22 23:49:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 23:49:04 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 22 23:49:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 22 23:49:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 22 23:51:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 22 23:51:02 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 22 23:55:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 22 23:55:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 22 23:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 22 23:59:21 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 00:01:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 00:01:09 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 23 00:06:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 00:06:09 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 23 00:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 00:09:36 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 00:10:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 00:11:12 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 23 00:15:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:15:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 00:16:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 00:16:13 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 23 00:19:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 00:19:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 00:21:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 00:21:34 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 23 00:29:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 00:29:04 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 00:29:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 00:29:59 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 23 00:30:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:31:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 00:31:35 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 23 00:33:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:34:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:36:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:40:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 00:40:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 00:40:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 00:41:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 23 00:41:06 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 23 00:41:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 00:41:38 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 23 00:50:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 00:50:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 00:51:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 00:51:07 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 23 00:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 00:51:51 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 00:58:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:00:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 01:00:30 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 01:00:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:00:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 01:01:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 01:01:10 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 23 01:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 01:01:52 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 23 01:08:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:10:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 01:10:45 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 23 01:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 01:11:53 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 01:12:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22e4302400, cur 1563869538 expire 1563869388 last 1563869311 Jul 23 01:14:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 01:14:41 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 23 01:18:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:18:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 01:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 01:20:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 01:21:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 01:21:53 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 23 01:23:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:24:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 01:24:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 01:30:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 01:30:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 01:31:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 01:31:54 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 23 01:34:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:34:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 01:35:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client df8d4efa-7e68-68c2-c181-c780d6f0cc9a (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3992e9e800, cur 1563870908 expire 1563870758 last 1563870681 Jul 23 01:38:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 01:38:37 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 01:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 01:40:59 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 01:42:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 01:42:05 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 23 01:47:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:47:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 01:48:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 01:48:42 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 23 01:51:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 01:51:11 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 01:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 01:52:05 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 23 01:58:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 01:58:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 01:58:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 01:58:48 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 23 02:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 02:01:56 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 02:02:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 02:02:17 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 23 02:08:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 02:08:57 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 23 02:09:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 02:09:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 02:12:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 02:12:02 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 02:12:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 02:12:26 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 23 02:19:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 02:19:35 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 02:21:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 02:21:18 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 23 02:22:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 02:22:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 02:22:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 02:22:30 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 23 02:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 02:32:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 02:32:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 02:32:33 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 02:33:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 02:33:28 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 23 02:33:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 02:33:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 02:42:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 02:42:12 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 02:42:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 02:42:37 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 23 02:43:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 02:43:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 02:44:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 02:44:58 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 02:52:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 02:52:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 02:52:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 02:52:38 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 23 02:55:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 02:55:25 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 23 03:02:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 03:02:36 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 03:03:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 03:03:03 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 23 03:06:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 03:06:27 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 03:09:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:09:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 03:12:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 03:12:39 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 03:13:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 03:13:21 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 23 03:16:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:16:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 03:17:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 03:17:17 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 23 03:21:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:21:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 03:22:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 03:22:41 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 03:23:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 03:23:22 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 23 03:28:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 03:28:21 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 03:29:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:29:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 03:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 03:32:49 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 03:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 03:33:24 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 23 03:38:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 03:38:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 23 03:40:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:40:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 03:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 03:43:33 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 03:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 03:43:33 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 23 03:48:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 03:48:24 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 03:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 03:53:46 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 03:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 03:53:46 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 23 03:58:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 03:58:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 03:59:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 03:59:28 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 23 04:04:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 04:04:02 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 23 04:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 04:04:03 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 23 04:09:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 04:09:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 04:10:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 04:10:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 04:13:39 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1b20966050 x1631353393742496/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:28/0 lens 488/448 e 1 to 0 dl 1563880438 ref 1 fl Interpret:/0/0 rc 0/0 Jul 23 04:13:39 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 23 04:13:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 23 04:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 04:14:06 fir-md1-s1 kernel: Lustre: Skipped 183476 previous similar messages Jul 23 04:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 04:14:06 fir-md1-s1 kernel: Lustre: Skipped 183505 previous similar messages Jul 23 04:20:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 04:20:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 04:21:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 04:21:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 23 04:24:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 04:24:43 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 23 04:24:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 04:24:43 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 23 04:31:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 04:31:43 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 23 04:32:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 04:32:20 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 04:34:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 04:34:49 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 23 04:35:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 04:35:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 23 04:41:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 04:41:46 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 04:44:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 04:44:51 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 04:44:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 04:44:51 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 23 04:45:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 04:45:19 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 04:52:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 04:52:09 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 04:54:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 04:54:57 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 23 04:55:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 04:55:43 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 23 04:55:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 04:55:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 05:02:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 05:02:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 23 05:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 05:05:05 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 23 05:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 05:05:54 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 05:13:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 05:13:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 23 05:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 05:15:05 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 23 05:15:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 05:15:56 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 05:18:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:18:15 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 23 05:24:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 05:24:57 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 23 05:25:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 05:25:13 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 05:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 05:27:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 23 05:31:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18b43abc00, cur 1563885090 expire 1563884940 last 1563884863 Jul 23 05:32:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:32:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 05:34:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:35:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 05:35:05 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 05:35:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 05:35:19 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 23 05:37:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 05:37:45 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 05:39:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:41:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:42:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:45:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 05:45:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:45:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 05:45:12 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 23 05:45:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 05:45:40 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 23 05:47:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 05:47:55 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 23 05:50:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 05:50:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 05:55:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 05:55:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 05:57:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 05:57:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 23 05:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 05:58:06 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 23 06:01:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:05:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 06:05:47 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 06:08:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 06:08:16 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 06:09:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 06:09:16 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 23 06:15:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 06:15:47 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 23 06:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 06:18:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 06:19:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 06:19:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 06:25:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:25:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 06:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 06:25:47 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 23 06:28:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 06:28:56 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 06:29:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 23 06:29:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 06:33:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:33:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 06:35:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 06:35:50 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 23 06:39:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 06:39:06 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 06:40:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 06:40:22 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 06:45:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 06:45:51 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 23 06:49:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 06:49:21 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 06:50:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 06:50:22 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 23 06:51:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:51:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 06:52:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:55:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 06:55:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 06:55:52 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 23 06:59:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 06:59:27 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 07:00:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 07:00:26 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 23 07:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 07:05:52 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 07:10:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 07:10:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 07:11:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 07:11:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 07:16:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 07:16:22 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 23 07:17:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:20:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 07:20:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 23 07:21:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 07:21:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 07:23:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:26:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 07:26:43 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 23 07:27:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 07:30:59 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 23 07:32:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e5764c400, cur 1563892325 expire 1563892175 last 1563892098 Jul 23 07:32:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:33:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 07:33:24 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 23 07:36:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 07:36:57 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 23 07:36:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:36:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 07:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 07:41:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 07:43:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 07:43:31 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 07:44:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:44:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 07:47:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 07:47:09 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 23 07:52:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 07:52:01 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 07:53:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 07:53:51 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 07:57:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 07:57:26 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 23 07:57:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 07:57:45 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 08:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 08:02:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 23 08:03:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 08:03:56 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 08:07:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 08:07:37 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 23 08:12:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 08:12:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 08:14:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 08:14:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 08:17:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 08:17:47 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 23 08:22:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 08:22:59 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 08:24:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 08:24:21 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 08:26:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 08:26:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 08:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 08:27:47 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 23 08:29:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 08:29:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 08:33:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 08:33:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 08:33:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 08:33:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 08:34:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 23 08:34:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 23 08:38:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 08:38:11 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 23 08:39:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 08:43:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 08:43:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 08:45:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 49981e1a-27d6-cefb-c17b-fa6c3fdb2591 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34aec1d800, cur 1563896717 expire 1563896567 last 1563896490 Jul 23 08:45:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 08:45:58 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 23 08:48:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 08:48:28 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 23 08:51:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 08:51:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 08:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 08:53:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 23 08:56:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 08:56:02 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 08:58:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 08:58:41 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 23 09:01:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 09:01:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 09:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 09:03:22 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 09:06:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 09:06:36 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 23 09:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 09:09:04 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 09:13:06 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1fa83d0600 x1637400903667328/t0(0) o36->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:11/0 lens 528/2888 e 1 to 0 dl 1563898391 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 09:13:06 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 23 09:13:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 09:13:18 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 23 09:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 09:13:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 09:17:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 09:17:26 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 23 09:19:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 09:19:23 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 23 09:23:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 09:23:57 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 23 09:27:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 09:27:34 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 23 09:28:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 09:28:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 09:29:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 09:29:27 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 09:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 09:34:21 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 09:39:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 09:39:28 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 23 09:41:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 09:41:45 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 09:42:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f246bb6f400, cur 1563900172 expire 1563900022 last 1563899945 Jul 23 09:42:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 23 09:44:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 09:44:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 09:44:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 09:44:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 09:50:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 09:50:00 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 23 09:53:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 09:53:01 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 09:54:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 09:54:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 09:56:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 09:56:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 10:00:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 10:00:04 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 23 10:03:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 10:03:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 10:04:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 10:04:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 10:07:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 10:07:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 10:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 10:10:07 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 10:15:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 10:15:12 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 10:16:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 10:16:04 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 23 10:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3434797800, cur 1563902169 expire 1563902019 last 1563901942 Jul 23 10:17:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 10:17:48 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 10:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 10:20:38 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 23 10:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 10:25:22 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 10:27:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 10:27:42 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 10:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 10:31:14 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 23 10:35:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 10:35:45 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 10:37:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 10:37:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 23 10:41:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 10:41:21 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 23 10:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 10:45:49 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 23 10:49:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 10:49:08 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 23 10:51:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 10:51:31 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 23 10:55:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 10:55:50 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 10:55:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 10:55:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 10:59:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 10:59:09 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 11:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 11:01:57 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Jul 23 11:03:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 11:05:54 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 11:10:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 11:10:35 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 11:11:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 11:11:59 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 23 11:15:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 11:15:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 11:21:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:21:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 11:22:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:22:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 11:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 11:22:27 fir-md1-s1 kernel: Lustre: Skipped 117263 previous similar messages Jul 23 11:22:27 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 11:23:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:25:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:26:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 11:26:00 fir-md1-s1 kernel: Lustre: Skipped 117233 previous similar messages Jul 23 11:31:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:31:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 11:32:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 11:32:27 fir-md1-s1 kernel: Lustre: Skipped 13948 previous similar messages Jul 23 11:36:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 11:36:20 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 23 11:36:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 11:36:29 fir-md1-s1 kernel: Lustre: Skipped 13973 previous similar messages Jul 23 11:40:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:42:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 11:42:44 fir-md1-s1 kernel: Lustre: Skipped 55086 previous similar messages Jul 23 11:46:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 11:46:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 11:46:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 23 11:46:52 fir-md1-s1 kernel: Lustre: Skipped 55034 previous similar messages Jul 23 11:52:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 11:52:49 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 23 11:54:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 11:54:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 11:57:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 11:57:02 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 23 12:03:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 12:03:14 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 23 12:04:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 12:04:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 12:05:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 12:05:04 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 23 12:07:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 12:07:11 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 23 12:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 12:13:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 23 12:15:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 12:15:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 23 12:17:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 12:17:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 12:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 12:18:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 12:23:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 12:23:33 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 23 12:25:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 12:25:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 12:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 12:27:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 12:33:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 12:33:51 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 23 12:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21d9b6f000, cur 1563910577 expire 1563910427 last 1563910350 Jul 23 12:37:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 12:37:33 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 23 12:37:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 12:37:51 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 12:38:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 12:38:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 12:40:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 12:42:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 12:42:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 12:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 12:44:03 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 23 12:47:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 12:47:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 12:47:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 12:47:59 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 12:54:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 12:54:22 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 23 12:57:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 12:57:57 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 23 12:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 12:58:28 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 23 13:04:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 13:04:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 23 13:06:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:06:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 13:08:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 13:08:21 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 23 13:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 13:08:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 13:14:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 13:14:25 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 13:18:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 13:18:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 13:18:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 13:19:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 13:24:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 13:24:31 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 23 13:27:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:28:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:28:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 13:28:25 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 13:28:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 13:29:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 23 13:32:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:34:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 13:34:31 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 23 13:38:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 13:39:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 13:39:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 13:39:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 23 13:44:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:44:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 13:44:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 13:44:32 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 23 13:49:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 23 13:49:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 13:50:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 13:50:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 13:50:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 13:50:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 13:55:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 13:55:06 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 23 14:00:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 14:00:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 14:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 14:00:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 14:00:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 14:00:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 14:05:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 14:05:07 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 23 14:10:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 14:10:10 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 14:10:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 14:10:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 14:15:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 14:15:08 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 23 14:16:36 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 23 14:16:36 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 10 previous similar messages Jul 23 14:19:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 14:19:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 14:20:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 14:20:26 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 14:20:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 14:20:31 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 14:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 14:25:29 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 23 14:29:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 14:29:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 14:30:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 14:30:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 14:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 14:30:46 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 14:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 14:35:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 14:40:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 14:40:35 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 14:40:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 14:40:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 14:45:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 14:45:39 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 23 14:47:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 14:47:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 14:50:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 14:50:49 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 23 14:50:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 14:50:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 23 14:55:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 14:55:55 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 23 15:00:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:00:54 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 15:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 15:01:05 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 15:05:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 15:05:56 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 23 15:09:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:09:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 15:11:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:11:15 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 23 15:11:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 15:11:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 23 15:16:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 15:16:03 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 23 15:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 15:21:22 fir-md1-s1 kernel: Lustre: Skipped 50681 previous similar messages Jul 23 15:21:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:22:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:22:17 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 15:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 15:26:15 fir-md1-s1 kernel: Lustre: Skipped 93087 previous similar messages Jul 23 15:27:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:29:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:30:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:30:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 15:31:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 15:31:22 fir-md1-s1 kernel: Lustre: Skipped 42467 previous similar messages Jul 23 15:32:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:32:44 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 23 15:35:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 15:36:24 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 23 15:39:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:39:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 15:41:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 15:41:23 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 23 15:43:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:43:59 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 15:46:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 15:46:28 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 23 15:47:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:47:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 15:48:08 fir-md1-s1 kernel: Lustre: 23727:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563922081/real 1563922081] req@ffff8f451debce00 x1636742926666384/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563922088 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 23 15:48:08 fir-md1-s1 kernel: Lustre: 23727:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 23 15:48:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 23 15:51:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 15:51:24 fir-md1-s1 kernel: Lustre: Skipped 395 previous similar messages Jul 23 15:55:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 15:55:12 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 23 15:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 15:56:37 fir-md1-s1 kernel: Lustre: Skipped 412 previous similar messages Jul 23 15:56:46 fir-md1-s1 kernel: LustreError: 25631:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f212aadf050 x1631353403675872/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:14/0 lens 488/448 e 0 to 0 dl 1563922634 ref 1 fl Interpret:/0/0 rc 0/0 Jul 23 15:56:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 23 15:57:53 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563922662/real 1563922662] req@ffff8f06d1cfb900 x1636742995171888/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563922673 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 23 15:57:53 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 23 15:57:57 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f21871f4e00 x1636455816722672/t0(0) o101->57c4b5d7-5f64-7a71-67bf-cc14ebefeb9d@10.9.102.25@o2ib4:2/0 lens 576/3264 e 1 to 0 dl 1563922682 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 15:57:57 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 110 previous similar messages Jul 23 15:58:01 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1b2f31b300 x1631585138225920/t0(0) o101->409782ab-594c-0837-10bd-459bd6e52b7f@10.9.106.26@o2ib4:6/0 lens 576/0 e 1 to 0 dl 1563922686 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 23 15:58:01 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1025 previous similar messages Jul 23 15:58:04 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f2fc840e300 x1631840442656992/t0(0) o101->533f2d59-21df-dd34-d3a6-f780aca8b580@10.8.25.3@o2ib6:3/0 lens 576/0 e 1 to 0 dl 1563922683 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 23 15:58:04 fir-md1-s1 kernel: LustreError: 97638:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.104.51@o2ib4: deadline 20:1s ago req@ffff8f1daf7cb900 x1631546992631600/t0(0) o101->603ef852-66df-b745-900b-b12995ddbb59@10.9.104.51@o2ib4:3/0 lens 576/0 e 1 to 0 dl 1563922683 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 23 15:58:04 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 517 previous similar messages Jul 23 15:58:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 15:58:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 16:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 16:01:31 fir-md1-s1 kernel: Lustre: Skipped 315 previous similar messages Jul 23 16:05:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 16:05:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 23 16:06:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 16:06:59 fir-md1-s1 kernel: Lustre: Skipped 329 previous similar messages Jul 23 16:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 16:11:43 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 16:15:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 16:15:14 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 23 16:15:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 16:15:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 16:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 16:17:01 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 23 16:22:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 16:22:19 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 16:25:50 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23b9382a00 x1633661216619952/t0(0) o36->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:25/0 lens 496/448 e 1 to 0 dl 1563924355 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 16:25:50 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 215 previous similar messages Jul 23 16:25:51 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e1997f500 x1631608869667552/t0(0) o101->9d9a34f1-f4e0-0f10-cc72-f899159f3999@10.9.108.44@o2ib4:26/0 lens 576/3264 e 1 to 0 dl 1563924356 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 16:25:51 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 81 previous similar messages Jul 23 16:25:53 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d5823f200 x1631348254671520/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:28/0 lens 576/0 e 1 to 0 dl 1563924358 ref 2 fl New:/2/ffffffff rc 0/-1 Jul 23 16:25:53 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7362 previous similar messages Jul 23 16:25:57 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ea5f40900 x1635088965258848/t0(0) o101->d6c95989-a33e-02cc-37c5-1e98ca81c68c@10.9.105.2@o2ib4:2/0 lens 576/0 e 1 to 0 dl 1563924362 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 23 16:25:57 fir-md1-s1 kernel: Lustre: 20727:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7646 previous similar messages Jul 23 16:26:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e291e8d80/0x5d9ee6804af7de00 lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2267a5f94 expref: 119 pid: 97646 timeout: 3039424 lvb_type: 0 Jul 23 16:26:05 fir-md1-s1 kernel: Lustre: 97642:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f24a8ac0f00 x1631811001509600/t0(0) o101->43d4c491-0147-e9a3-8154-08fbbbab65ce@10.8.25.11@o2ib6:26/0 lens 576/0 e 1 to 0 dl 1563924356 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 23 16:26:05 fir-md1-s1 kernel: LustreError: 20460:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.24.31@o2ib6: deadline 20:1s ago req@ffff8f24e7f68600 x1631784681571520/t0(0) o101->2af570cb-795a-fe3c-3d97-371ab72a526d@10.8.24.31@o2ib6:3/0 lens 576/0 e 1 to 0 dl 1563924363 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 23 16:26:05 fir-md1-s1 kernel: LustreError: 20460:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 21 previous similar messages Jul 23 16:26:05 fir-md1-s1 kernel: Lustre: 97642:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 15743 previous similar messages Jul 23 16:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 16:27:10 fir-md1-s1 kernel: Lustre: Skipped 2779 previous similar messages Jul 23 16:27:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 16:27:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 23 16:27:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 16:27:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 16:32:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 16:32:35 fir-md1-s1 kernel: Lustre: Skipped 2745 previous similar messages Jul 23 16:37:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 16:37:26 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 23 16:38:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 16:38:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 23 16:42:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 16:42:38 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 23 16:42:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 23 16:42:52 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 23 16:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 16:47:34 fir-md1-s1 kernel: Lustre: Skipped 78939 previous similar messages Jul 23 16:47:37 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f273eb2dd00 x1631353405343344/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:12/0 lens 376/1600 e 0 to 0 dl 1563925662 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 16:47:37 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 611 previous similar messages Jul 23 16:48:42 fir-md1-s1 kernel: LustreError: 23659:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1563925632, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f276da5ba80/0x5d9ee6806cd50561 lrc: 3/0,1 mode: --/EX res: [0x2c002c626:0x7:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23659 timeout: 0 lvb_type: 0 Jul 23 16:49:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e1f434140/0x5d9ee6806cd48af6 lrc: 3/0,0 mode: CR/CR res: [0x2c002c626:0x7:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2267a8ce3 expref: 38 pid: 23716 timeout: 3040841 lvb_type: 0 Jul 23 16:49:41 fir-md1-s1 kernel: LustreError: 23659:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f25e614f800 ns: mdt-fir-MDT0002_UUID lock: ffff8f276da5f500/0x5d9ee6806cd5055a lrc: 1/0,0 mode: EX/EX res: [0x2c002c626:0x7:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x54801000000000 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2267a8cff expref: 8 pid: 23659 timeout: 0 lvb_type: 3 Jul 23 16:49:41 fir-md1-s1 kernel: Lustre: 23659:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:119s); client may timeout. req@ffff8f273eb2dd00 x1631353405343344/t354507247576(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:12/0 lens 376/1568 e 0 to 0 dl 1563925662 ref 1 fl Complete:/0/0 rc -107/-107 Jul 23 16:50:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 16:50:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 16:53:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 16:53:24 fir-md1-s1 kernel: Lustre: Skipped 78885 previous similar messages Jul 23 16:54:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 16:54:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 16:57:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 16:57:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 16:57:42 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 23 17:01:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 17:01:03 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 23 17:03:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 17:03:40 fir-md1-s1 kernel: Lustre: Skipped 97616 previous similar messages Jul 23 17:07:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 17:07:43 fir-md1-s1 kernel: Lustre: Skipped 97676 previous similar messages Jul 23 17:07:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 17:07:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 17:12:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 17:12:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 23 17:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 17:13:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 23 17:15:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 17:15:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 17:17:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 17:17:51 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 23 17:23:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 17:23:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 23 17:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 17:24:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 23 17:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 17:28:25 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 23 17:29:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 17:29:31 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 17:33:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 17:33:40 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 23 17:34:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 17:34:32 fir-md1-s1 kernel: Lustre: Skipped 27034 previous similar messages Jul 23 17:38:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 17:38:55 fir-md1-s1 kernel: Lustre: Skipped 89216 previous similar messages Jul 23 17:39:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 17:39:44 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 17:43:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 17:43:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 23 17:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 17:44:34 fir-md1-s1 kernel: Lustre: Skipped 62197 previous similar messages Jul 23 17:48:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 17:48:58 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 17:50:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 17:50:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 17:52:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2012a0c400, cur 1563929531 expire 1563929381 last 1563929304 Jul 23 17:54:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 17:54:39 fir-md1-s1 kernel: Lustre: Skipped 38747 previous similar messages Jul 23 17:55:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 17:55:32 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 17:59:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 17:59:10 fir-md1-s1 kernel: Lustre: Skipped 38786 previous similar messages Jul 23 18:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 18:04:44 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 23 18:06:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 18:06:02 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 18:09:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 18:09:17 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 23 18:09:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 18:09:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 18:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 18:15:22 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 18:16:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 18:16:27 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 23 18:19:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 18:19:38 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 23 18:20:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 18:20:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 18:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 18:25:31 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 18:26:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 18:26:28 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 23 18:27:38 fir-md1-s1 kernel: Lustre: 23584:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563931651/real 1563931651] req@ffff8f2c48290600 x1636743415179344/t0(0) o104->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1563931658 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 23 18:27:38 fir-md1-s1 kernel: Lustre: 23584:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 23 18:29:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 18:29:55 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 23 18:30:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 18:30:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 18:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 18:35:51 fir-md1-s1 kernel: Lustre: Skipped 21494 previous similar messages Jul 23 18:36:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 23 18:36:48 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 23 18:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 18:40:04 fir-md1-s1 kernel: Lustre: Skipped 21513 previous similar messages Jul 23 18:42:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 18:42:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 18:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 18:45:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 23 18:46:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 18:46:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 18:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 18:50:25 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 23 18:53:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 18:53:05 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 18:56:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 18:56:55 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 23 18:56:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 18:56:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 23 19:00:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 19:00:38 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 23 19:05:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 19:05:09 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 23 19:07:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 19:07:00 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 23 19:07:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 19:07:25 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 23 19:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 19:10:53 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 23 19:16:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 19:16:46 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 19:17:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 19:17:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 19:17:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 19:17:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 23 19:20:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 19:20:58 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 23 19:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 19:27:20 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 19:27:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 19:27:34 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 19:30:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 19:30:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 19:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 19:31:14 fir-md1-s1 kernel: Lustre: Skipped 8308 previous similar messages Jul 23 19:37:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 19:37:20 fir-md1-s1 kernel: Lustre: Skipped 14806 previous similar messages Jul 23 19:39:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 19:39:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 19:40:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 19:40:54 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 23 19:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 19:41:29 fir-md1-s1 kernel: Lustre: Skipped 6586 previous similar messages Jul 23 19:47:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 19:47:44 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 19:50:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 19:50:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 19:51:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 19:51:21 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 23 19:51:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 19:51:30 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 23 19:58:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 19:58:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 20:01:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 20:01:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 20:01:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 20:01:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 20:01:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 20:01:38 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 23 20:08:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 20:08:25 fir-md1-s1 kernel: Lustre: Skipped 3913 previous similar messages Jul 23 20:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 20:11:58 fir-md1-s1 kernel: Lustre: Skipped 3935 previous similar messages Jul 23 20:12:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 20:12:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 20:13:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 20:13:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 23 20:17:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f1dffa400, cur 1563938246 expire 1563938096 last 1563938019 Jul 23 20:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 20:18:37 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 20:20:39 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 23 20:22:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 20:22:00 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 23 20:23:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 20:23:05 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 23 20:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 20:28:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 23 20:31:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 20:32:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 20:32:15 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 23 20:33:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 23 20:33:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 20:38:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 20:38:56 fir-md1-s1 kernel: Lustre: Skipped 58746 previous similar messages Jul 23 20:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 20:42:20 fir-md1-s1 kernel: Lustre: Skipped 58765 previous similar messages Jul 23 20:43:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 20:43:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 20:46:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 20:46:04 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 20:48:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b36f5d000, cur 1563940084 expire 1563939934 last 1563939857 Jul 23 20:48:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c429ac87-0af5-acec-8a40-5e6c2e99ccb1 (at 10.9.101.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4507eb6000, cur 1563940125 expire 1563939975 last 1563939898 Jul 23 20:48:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 23 20:48:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 20:48:59 fir-md1-s1 kernel: Lustre: Skipped 60203 previous similar messages Jul 23 20:49:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 83acfeb9-f28f-d51c-b3c4-69b2ea2161d2 (at 10.9.101.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251ba9b800, cur 1563940151 expire 1563940001 last 1563939924 Jul 23 20:52:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 20:52:32 fir-md1-s1 kernel: Lustre: Skipped 60227 previous similar messages Jul 23 20:54:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 20:54:35 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 23 20:56:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 20:56:07 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 20:59:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 20:59:41 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 21:03:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 21:03:04 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 23 21:06:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 21:06:59 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 23 21:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 21:10:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 23 21:12:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 21:12:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 21:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 21:13:45 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 23 21:17:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 21:17:13 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 23 21:20:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 21:20:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 23 21:21:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2faf242000, cur 1563942107 expire 1563941957 last 1563941880 Jul 23 21:24:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 21:24:19 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 21:24:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 21:24:53 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 23 21:28:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 21:28:09 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 23 21:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 21:30:37 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 23 21:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 21:35:01 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 23 21:38:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 21:38:14 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 23 21:39:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 21:39:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 21:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 21:41:16 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 23 21:45:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 21:45:11 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 23 21:50:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 23 21:50:00 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 23 21:51:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 21:51:21 fir-md1-s1 kernel: Lustre: Skipped 54955 previous similar messages Jul 23 21:55:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 21:55:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 21:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 21:56:04 fir-md1-s1 kernel: Lustre: Skipped 54973 previous similar messages Jul 23 22:00:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:00:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 23 22:01:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 22:01:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 22:05:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 22:05:44 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 22:06:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 22:06:05 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 22:10:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:10:14 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 22:11:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 22:11:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 23 22:13:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a50a8e000, cur 1563945218 expire 1563945068 last 1563944991 Jul 23 22:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 22:16:28 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 23 22:20:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:20:25 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 23 22:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 22:22:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 22:22:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 22:22:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 22:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 23 22:26:36 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 23 22:30:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:30:33 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 23 22:32:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 22:32:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 23 22:34:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 22:34:19 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 22:36:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 22:36:41 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 23 22:41:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:41:40 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 23 22:43:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 22:43:37 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 23 22:46:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 22:46:41 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 23 22:50:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 22:50:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 23 22:51:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 22:51:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 23 22:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 23 22:53:43 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 23 22:56:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 22:56:51 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 23 23:02:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 23:02:59 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 23 23:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 23 23:03:59 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 23 23:04:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 23:04:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 23 23:06:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 23:06:57 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 23 23:14:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 23 23:14:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 23 23:15:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 23:15:05 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 23 23:15:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 23:15:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 23 23:17:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 23:17:06 fir-md1-s1 kernel: Lustre: Skipped 30522 previous similar messages Jul 23 23:24:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 23:24:02 fir-md1-s1 kernel: Lustre: Skipped 65108 previous similar messages Jul 23 23:25:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 23:25:10 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 23 23:26:34 fir-md1-s1 kernel: LustreError: 46558:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da4450 x1631353416493888/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:9/0 lens 488/448 e 0 to 0 dl 1563949599 ref 1 fl Interpret:/0/0 rc 0/0 Jul 23 23:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 23 23:27:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 23 23:27:31 fir-md1-s1 kernel: Lustre: Skipped 34717 previous similar messages Jul 23 23:30:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 23:33:14 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da3c50 x1631353416541616/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:13/0 lens 488/448 e 0 to 0 dl 1563950023 ref 1 fl Interpret:/0/0 rc 0/0 Jul 23 23:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 23 23:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 23:34:11 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 23 23:35:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 23 23:35:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 23 23:37:15 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 23 23:37:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 23 23:37:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 23 23:44:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e802e8400, cur 1563950645 expire 1563950495 last 1563950418 Jul 23 23:44:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 23 23:44:18 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 23 23:46:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 23:46:59 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 23 23:47:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 23 23:47:55 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 23 23:50:33 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f107c63e050 x1631570609989424/t0(0) o4->b1560181-32d0-3000-87fb-1969e5df2f5e@10.9.101.68@o2ib4:8/0 lens 488/448 e 1 to 0 dl 1563951038 ref 2 fl Interpret:/0/0 rc 0/0 Jul 23 23:54:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 23 23:54:35 fir-md1-s1 kernel: Lustre: Skipped 8010 previous similar messages Jul 23 23:55:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 23:55:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 23 23:57:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 23 23:57:01 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 23 23:57:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 23 23:58:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 23 23:58:30 fir-md1-s1 kernel: Lustre: Skipped 31629 previous similar messages Jul 24 00:04:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 00:04:38 fir-md1-s1 kernel: Lustre: Skipped 23633 previous similar messages Jul 24 00:07:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 00:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 00:08:38 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 00:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 00:08:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 24 00:13:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 00:13:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 00:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 00:14:44 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 00:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 00:18:45 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 24 00:19:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 00:19:23 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 24 00:25:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 00:25:01 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 24 00:28:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 00:28:48 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 00:29:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 00:29:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 00:29:51 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 24 00:35:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 00:35:13 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 00:39:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 00:39:23 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 24 00:40:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 00:40:29 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 24 00:40:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 00:40:30 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 00:45:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 00:45:26 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 00:49:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 00:49:36 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 24 00:50:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 00:50:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 00:52:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 00:52:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 00:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 00:55:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 24 00:59:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2747723c00, cur 1563955185 expire 1563955035 last 1563954958 Jul 24 00:59:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 00:59:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 24 01:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:01:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 01:02:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:02:42 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 24 01:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 01:06:01 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 01:09:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 01:09:50 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 24 01:13:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:13:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 01:16:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 01:16:11 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 01:16:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:16:55 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 01:20:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 01:20:03 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 24 01:24:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:24:15 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 24 01:26:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 01:26:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 01:28:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:28:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 24 01:30:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 01:30:16 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 24 01:35:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:35:25 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 24 01:36:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 01:36:34 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 01:39:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:39:25 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 01:40:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 01:40:31 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 24 01:46:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:46:01 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 24 01:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 01:47:15 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 01:49:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:49:43 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 24 01:50:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 01:50:38 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 24 01:57:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 01:57:49 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 24 01:57:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 01:57:50 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 01:59:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 01:59:59 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 02:00:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 02:00:58 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 24 02:07:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 02:07:50 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 02:08:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 02:08:19 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 24 02:10:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 02:10:04 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 24 02:10:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 02:10:59 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 02:17:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 02:17:54 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 02:18:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 02:18:51 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 24 02:20:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 02:20:10 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 02:21:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 02:21:03 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 24 02:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 02:27:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 02:29:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 02:29:58 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 24 02:30:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 02:30:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 24 02:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 02:31:05 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 02:38:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 02:38:15 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 02:40:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 02:40:11 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 24 02:41:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 02:41:21 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 02:41:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 02:41:21 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 24 02:48:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 02:48:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 02:51:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 02:51:29 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 02:51:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 02:51:37 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 24 02:54:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 02:54:06 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 02:56:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f332b9ec400, cur 1563962217 expire 1563962067 last 1563961990 Jul 24 02:58:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 02:58:55 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 03:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 03:01:57 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 24 03:02:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 03:02:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 24 03:05:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 03:05:47 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 03:08:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 03:08:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 03:12:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 03:12:09 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 24 03:13:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 03:13:43 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 03:18:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 03:18:58 fir-md1-s1 kernel: Lustre: Skipped 131203 previous similar messages Jul 24 03:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 03:22:10 fir-md1-s1 kernel: Lustre: Skipped 143779 previous similar messages Jul 24 03:24:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 03:24:38 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 03:26:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 03:26:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 24 03:29:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 03:29:05 fir-md1-s1 kernel: Lustre: Skipped 120050 previous similar messages Jul 24 03:32:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 03:32:20 fir-md1-s1 kernel: Lustre: Skipped 107569 previous similar messages Jul 24 03:33:26 fir-md1-s1 kernel: LustreError: 46558:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da5050 x1631353426651296/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:14/0 lens 488/448 e 1 to 0 dl 1563964424 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 03:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 03:35:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 03:35:08 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 03:36:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 03:36:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 03:39:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 03:39:27 fir-md1-s1 kernel: Lustre: Skipped 117374 previous similar messages Jul 24 03:42:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 03:42:35 fir-md1-s1 kernel: Lustre: Skipped 117374 previous similar messages Jul 24 03:46:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 03:46:59 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 24 03:47:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 03:49:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 03:49:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 03:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 03:52:51 fir-md1-s1 kernel: Lustre: Skipped 9078 previous similar messages Jul 24 03:57:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 03:57:05 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 03:59:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 03:59:54 fir-md1-s1 kernel: Lustre: Skipped 9053 previous similar messages Jul 24 04:03:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 04:03:27 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 24 04:03:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 04:03:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 04:09:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 04:09:03 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 24 04:10:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 04:10:05 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 04:13:09 fir-md1-s1 kernel: LustreError: 46588:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da0050 x1631353428981328/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1563966818 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 04:13:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 04:13:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 04:13:55 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 24 04:15:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 04:15:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 04:18:39 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da4050 x1631353429023712/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 488/448 e 0 to 0 dl 1563967147 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 04:18:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 04:19:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 04:19:19 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 04:20:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 04:20:17 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 24 04:23:15 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f228a55b050 x1631353429058176/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:14/0 lens 488/448 e 0 to 0 dl 1563967424 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 04:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 04:24:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 04:24:48 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 24 04:28:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 04:28:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 04:28:38 fir-md1-s1 kernel: LustreError: 46588:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1c22603050 x1631353429102848/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 488/448 e 0 to 0 dl 1563967747 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 04:28:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 04:30:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 04:30:13 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 24 04:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 04:30:35 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 24 04:34:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 04:34:57 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 24 04:40:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 04:40:17 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 04:40:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 04:40:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 04:42:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 04:42:04 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 24 04:45:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 04:45:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 24 04:50:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 04:50:20 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 04:51:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 04:51:12 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 04:53:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 04:53:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 04:55:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 04:55:07 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 24 05:01:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 05:01:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 05:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 05:01:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 05:05:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 05:05:09 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 24 05:08:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 05:11:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b4429ec00, cur 1563970260 expire 1563970110 last 1563970033 Jul 24 05:11:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 05:11:35 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 05:11:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 05:11:44 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 05:15:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 05:15:10 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 24 05:21:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 05:21:48 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 24 05:22:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 05:22:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 05:23:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 05:23:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 05:25:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 05:25:12 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 24 05:29:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f00c61400, cur 1563971362 expire 1563971212 last 1563971135 Jul 24 05:31:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 05:31:51 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 24 05:32:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25d141ac00, cur 1563971567 expire 1563971417 last 1563971340 Jul 24 05:32:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 05:32:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 05:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 05:35:28 fir-md1-s1 kernel: Lustre: Skipped 23161 previous similar messages Jul 24 05:36:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 05:36:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 05:42:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 05:42:44 fir-md1-s1 kernel: Lustre: Skipped 23140 previous similar messages Jul 24 05:43:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 05:43:26 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 05:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 05:45:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 05:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 05:53:27 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 05:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 05:55:34 fir-md1-s1 kernel: Lustre: Skipped 106773 previous similar messages Jul 24 05:56:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 05:56:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 24 05:58:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 05:58:32 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 06:03:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 06:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 06:03:59 fir-md1-s1 kernel: Lustre: Skipped 106761 previous similar messages Jul 24 06:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 06:05:36 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 06:06:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 06:06:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 06:06:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 06:06:57 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 24 06:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 06:14:00 fir-md1-s1 kernel: Lustre: Skipped 32084 previous similar messages Jul 24 06:15:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 06:15:41 fir-md1-s1 kernel: Lustre: Skipped 32115 previous similar messages Jul 24 06:17:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 06:17:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 06:21:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 06:21:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 06:24:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 06:24:16 fir-md1-s1 kernel: Lustre: Skipped 26799 previous similar messages Jul 24 06:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 06:26:04 fir-md1-s1 kernel: Lustre: Skipped 26825 previous similar messages Jul 24 06:27:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 06:27:34 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 06:34:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 06:34:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 24 06:36:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 06:36:15 fir-md1-s1 kernel: Lustre: Skipped 842 previous similar messages Jul 24 06:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 06:36:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 06:36:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44ce6af000, cur 1563975386 expire 1563975236 last 1563975159 Jul 24 06:38:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 06:38:34 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 24 06:44:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 06:44:28 fir-md1-s1 kernel: Lustre: Skipped 796 previous similar messages Jul 24 06:46:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 06:46:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 24 06:48:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 24 06:48:43 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 24 06:51:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 06:51:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 06:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 06:54:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 24 06:56:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f100b549000, cur 1563976613 expire 1563976463 last 1563976386 Jul 24 06:56:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 06:56:57 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 24 07:00:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 07:00:28 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 24 07:02:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 07:02:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 07:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 07:05:05 fir-md1-s1 kernel: Lustre: Skipped 23104 previous similar messages Jul 24 07:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 07:06:58 fir-md1-s1 kernel: Lustre: Skipped 23102 previous similar messages Jul 24 07:10:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 07:10:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 24 07:12:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 07:12:27 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 07:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 07:15:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 24 07:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25aba02400, cur 1563977816 expire 1563977666 last 1563977589 Jul 24 07:17:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 07:17:42 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 24 07:21:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 24 07:21:06 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 07:24:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 07:24:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 07:25:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 07:25:24 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 07:28:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 07:28:21 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 24 07:31:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 07:31:35 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 07:34:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 07:34:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 07:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 07:35:29 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 07:38:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 07:38:31 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 24 07:43:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 07:43:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 07:45:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 07:45:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 07:48:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 07:48:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 07:48:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 07:48:31 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 24 07:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 07:55:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 07:55:40 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 24 07:55:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 07:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 07:59:20 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 08:00:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 08:00:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 08:05:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 08:05:41 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 24 08:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 08:05:52 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 08:09:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 08:09:22 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 24 08:10:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 08:10:04 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 24 08:16:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 08:16:19 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 08:16:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 08:16:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 24 08:19:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 08:19:22 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 24 08:27:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 08:27:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 08:27:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 08:27:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 08:29:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 08:29:24 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 24 08:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 08:37:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 08:37:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 08:37:28 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 24 08:39:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 08:39:46 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 24 08:40:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 08:40:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 08:43:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 08:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client abea71ae-b956-1f71-0b98-5c238f1bb381 (at 10.9.107.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45362da800, cur 1563983116 expire 1563982966 last 1563982889 Jul 24 08:47:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 08:47:32 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 24 08:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 08:47:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 08:49:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 08:49:47 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 24 08:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 08:57:43 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 08:57:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 08:57:50 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 08:59:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 08:59:52 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 24 09:05:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:06:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:07:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 09:07:57 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 24 09:08:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 09:08:39 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 24 09:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 09:10:16 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 24 09:13:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:17:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 09:17:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 09:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 09:18:43 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 24 09:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 09:20:22 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 24 09:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 09:29:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 24 09:30:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 09:30:28 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 09:31:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 09:31:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 24 09:35:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:39:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 09:39:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 09:39:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:40:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 09:40:29 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 24 09:44:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 09:44:09 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 24 09:46:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:50:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 09:50:04 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 09:51:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 09:51:07 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 24 09:52:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:54:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 09:54:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 09:55:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 09:55:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 09:58:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:00:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 10:00:39 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 10:01:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 10:01:11 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 24 10:04:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 10:04:43 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 24 10:05:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:05:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 10:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 10:10:53 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 10:12:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 10:12:29 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 24 10:14:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 10:14:45 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 24 10:16:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:16:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 10:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 10:21:13 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 10:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 10:22:29 fir-md1-s1 kernel: Lustre: Skipped 5914 previous similar messages Jul 24 10:26:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 10:26:23 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 24 10:27:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e42312800, cur 1563989271 expire 1563989121 last 1563989044 Jul 24 10:27:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 24 10:31:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 10:31:18 fir-md1-s1 kernel: Lustre: Skipped 125423 previous similar messages Jul 24 10:33:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 10:33:06 fir-md1-s1 kernel: Lustre: Skipped 119629 previous similar messages Jul 24 10:36:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:36:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 10:37:27 fir-md1-s1 kernel: Lustre: 23101:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1563989840/real 1563989840] req@ffff8f1e62818f00 x1636745103500064/t0(0) o105->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1563989847 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 10:39:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0af4f40a-317e-88ce-7d9c-c4839b78e5a4 (at 10.8.29.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d392c800, cur 1563989984 expire 1563989834 last 1563989757 Jul 24 10:40:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 24 10:40:32 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 10:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 10:41:30 fir-md1-s1 kernel: Lustre: Skipped 154016 previous similar messages Jul 24 10:43:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 10:43:12 fir-md1-s1 kernel: Lustre: Skipped 154042 previous similar messages Jul 24 10:45:36 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1690079050 x1631353439882304/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:24/0 lens 488/448 e 1 to 0 dl 1563990354 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 10:45:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 10:46:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:46:49 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 10:50:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 10:50:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 24 10:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 10:51:50 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 24 10:53:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 10:53:40 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 24 10:56:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 10:56:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 11:02:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 11:02:45 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 11:05:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 11:05:50 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 24 11:05:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 11:05:51 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 11:09:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 11:09:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 11:12:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 11:12:50 fir-md1-s1 kernel: Lustre: Skipped 11405 previous similar messages Jul 24 11:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 11:16:15 fir-md1-s1 kernel: Lustre: Skipped 11437 previous similar messages Jul 24 11:17:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 11:17:34 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 11:22:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 11:22:54 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 11:23:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 11:23:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 11:26:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 11:26:19 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 11:28:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1edaa73400, cur 1563992909 expire 1563992759 last 1563992682 Jul 24 11:28:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 24 11:29:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 11:29:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 24 11:31:34 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1127246900 x1633661226062976/t0(0) o36->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:9/0 lens 504/448 e 1 to 0 dl 1563993099 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 11:31:34 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Jul 24 11:31:35 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06d50ebf00 x1638871542710368/t0(0) o101->357ed5e6-797d-063b-772c-730368f05495@10.9.103.26@o2ib4:10/0 lens 576/3264 e 1 to 0 dl 1563993100 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 11:31:35 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 259 previous similar messages Jul 24 11:31:36 fir-md1-s1 kernel: Lustre: 23567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10fdde0300 x1638966059964720/t0(0) o101->6a159b93-cbcb-a910-1e2c-6484b2bca678@10.9.103.18@o2ib4:11/0 lens 576/3264 e 1 to 0 dl 1563993101 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 11:31:36 fir-md1-s1 kernel: Lustre: 23567:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 127 previous similar messages Jul 24 11:31:38 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f137b23f200 x1638732722427376/t0(0) o101->159ddaf1-ce95-3830-127f-4856eec7f12f@10.9.116.1@o2ib4:13/0 lens 576/3264 e 1 to 0 dl 1563993103 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 11:31:38 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 89 previous similar messages Jul 24 11:31:42 fir-md1-s1 kernel: Lustre: 21418:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f109e6c6000 x1635102570569872/t0(0) o101->7933cca6-376e-2621-120f-991576fc8851@10.9.109.52@o2ib4:17/0 lens 576/3264 e 1 to 0 dl 1563993107 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 11:31:42 fir-md1-s1 kernel: Lustre: 21418:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 63 previous similar messages Jul 24 11:31:50 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c66c6d100 x1634930994623232/t0(0) o101->8f367c70-6bbd-359c-a9cb-016bde9e7ec3@10.8.27.12@o2ib6:25/0 lens 576/0 e 1 to 0 dl 1563993115 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 24 11:31:50 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 287 previous similar messages Jul 24 11:32:06 fir-md1-s1 kernel: Lustre: 24578:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e1da81500 x1631586940069264/t0(0) o41->1feeaf0f-f950-19ca-4af6-be7b05afc879@10.8.27.18@o2ib6:11/0 lens 440/0 e 1 to 0 dl 1563993131 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 24 11:32:06 fir-md1-s1 kernel: Lustre: 24578:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 854 previous similar messages Jul 24 11:32:23 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:24s); client may timeout. req@ffff8f1299bc8f00 x1631715753081264/t0(0) o101->7dc1baf0-23aa-170f-06ea-1f337d7320ab@10.9.102.52@o2ib4:29/0 lens 576/0 e 1 to 0 dl 1563993119 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 24 11:32:23 fir-md1-s1 kernel: LustreError: 21669:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.101.54@o2ib4: deadline 30:1s ago req@ffff8f12320b3450 x1633915058448832/t0(0) o101->a7e9d272-b0d3-4359-c385-5d7a30e45350@10.9.101.54@o2ib4:22/0 lens 1768/0 e 0 to 0 dl 1563993142 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 24 11:32:23 fir-md1-s1 kernel: LustreError: 21669:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 24 11:32:23 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 24 11:32:23 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2296 previous similar messages Jul 24 11:32:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 11:32:59 fir-md1-s1 kernel: Lustre: Skipped 1020 previous similar messages Jul 24 11:34:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 11:34:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 11:36:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 11:36:36 fir-md1-s1 kernel: Lustre: Skipped 1050 previous similar messages Jul 24 11:39:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 11:39:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 24 11:43:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 11:43:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 11:45:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 11:45:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 11:46:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 11:46:46 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 11:50:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 11:50:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 11:53:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 11:53:26 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 11:57:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 11:57:38 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 24 12:01:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 12:01:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 12:02:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 12:02:45 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 12:03:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 12:03:26 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 24 12:07:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 12:07:42 fir-md1-s1 kernel: Lustre: Skipped 154 previous similar messages Jul 24 12:12:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 12:12:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 12:13:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 12:13:02 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 24 12:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 12:13:37 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 24 12:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 12:17:43 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 24 12:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 12:23:56 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 12:24:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 12:24:17 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 24 12:27:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 12:27:52 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 24 12:31:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 12:31:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 12:34:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 12:34:39 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 12:35:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 12:35:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 12:39:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 12:39:27 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 24 12:41:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 12:41:47 fir-md1-s1 kernel: LustreError: Skipped 158 previous similar messages Jul 24 12:44:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 12:44:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 12:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 12:45:16 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 12:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 12:49:33 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 24 12:55:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 12:55:30 fir-md1-s1 kernel: Lustre: Skipped 22979 previous similar messages Jul 24 12:55:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 12:55:57 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 24 12:59:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 12:59:34 fir-md1-s1 kernel: Lustre: Skipped 23015 previous similar messages Jul 24 12:59:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 12:59:50 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 13:05:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 13:05:43 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 24 13:06:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 13:06:44 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 24 13:09:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 13:09:56 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 24 13:13:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 13:13:06 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 13:15:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 13:15:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 13:17:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 13:17:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 13:20:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 13:20:02 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 24 13:26:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 13:26:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 13:27:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 13:27:50 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 13:27:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 13:27:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 13:30:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 13:30:08 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 24 13:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 13:36:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 13:38:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 13:38:10 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 24 13:38:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 13:38:33 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 13:40:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 13:40:24 fir-md1-s1 kernel: Lustre: Skipped 23386 previous similar messages Jul 24 13:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 13:46:30 fir-md1-s1 kernel: Lustre: Skipped 23373 previous similar messages Jul 24 13:51:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 13:51:04 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 24 13:51:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 13:51:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 13:54:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 13:56:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 13:56:46 fir-md1-s1 kernel: Lustre: Skipped 63456 previous similar messages Jul 24 14:01:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 14:01:09 fir-md1-s1 kernel: Lustre: Skipped 63492 previous similar messages Jul 24 14:02:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 14:02:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 14:06:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 14:06:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 14:07:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 14:07:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 14:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 14:11:21 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 24 14:14:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 14:14:05 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 14:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 14:16:58 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 24 14:18:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 14:18:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 14:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 14:21:43 fir-md1-s1 kernel: Lustre: Skipped 19038 previous similar messages Jul 24 14:22:10 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bd4da6050 x1631353443780304/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:15/0 lens 488/448 e 1 to 0 dl 1564003335 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 14:22:10 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1017 previous similar messages Jul 24 14:22:15 fir-md1-s1 kernel: LustreError: 46531:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1bd4da6050 x1631353443780304/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:15/0 lens 488/448 e 1 to 0 dl 1564003335 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 14:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 14:24:37 fir-md1-s1 kernel: LustreError: 25631:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1d49370c50 x1631353443800352/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:6/0 lens 488/448 e 0 to 0 dl 1564003506 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 14:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 14:24:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 14:24:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 24 14:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 14:26:59 fir-md1-s1 kernel: Lustre: Skipped 19042 previous similar messages Jul 24 14:29:29 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da2c50 x1631353444035392/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:28/0 lens 488/448 e 0 to 0 dl 1564003798 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 14:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 14:30:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 14:30:09 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 24 14:31:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 14:31:50 fir-md1-s1 kernel: Lustre: Skipped 12601 previous similar messages Jul 24 14:37:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 14:37:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 14:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 14:37:09 fir-md1-s1 kernel: Lustre: Skipped 12564 previous similar messages Jul 24 14:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 14:42:32 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 14:43:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 14:43:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 14:43:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3996ab7800, cur 1564004639 expire 1564004489 last 1564004412 Jul 24 14:47:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 14:47:09 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 14:49:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 14:49:40 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 24 14:52:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 14:52:43 fir-md1-s1 kernel: Lustre: Skipped 82057 previous similar messages Jul 24 14:57:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 14:57:10 fir-md1-s1 kernel: Lustre: Skipped 120850 previous similar messages Jul 24 15:00:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:00:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 15:01:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 15:01:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 15:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 15:03:05 fir-md1-s1 kernel: Lustre: Skipped 38847 previous similar messages Jul 24 15:07:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 15:07:27 fir-md1-s1 kernel: Lustre: Skipped 41269 previous similar messages Jul 24 15:10:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:10:57 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 24 15:13:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 15:13:42 fir-md1-s1 kernel: Lustre: Skipped 41304 previous similar messages Jul 24 15:17:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 15:17:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 15:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 15:17:43 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 15:22:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:22:09 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 15:24:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 15:24:01 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 24 15:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 15:27:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 15:32:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:32:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 24 15:33:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 15:33:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 15:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 15:34:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 24 15:38:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 15:38:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 15:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 61ddca6a-058e-3513-aadd-c424d13d1651 (at 10.9.105.68@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1471d4a800, cur 1564008052 expire 1564007902 last 1564007825 Jul 24 15:43:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:43:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 15:44:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 15:44:26 fir-md1-s1 kernel: Lustre: Skipped 1143 previous similar messages Jul 24 15:45:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 15:45:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 15:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 15:48:27 fir-md1-s1 kernel: Lustre: Skipped 1116 previous similar messages Jul 24 15:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 242072b2-4e2c-1df0-2d94-14c4c0c52592 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f11bcb9c000, cur 1564008617 expire 1564008467 last 1564008390 Jul 24 15:50:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 24 15:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 15:54:30 fir-md1-s1 kernel: Lustre: Skipped 14813 previous similar messages Jul 24 15:54:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 15:54:44 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 15:58:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 15:58:42 fir-md1-s1 kernel: Lustre: Skipped 14813 previous similar messages Jul 24 16:04:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 16:04:33 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 24 16:05:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 16:05:00 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 24 16:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 16:08:50 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 24 16:09:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:09:58 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 16:13:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client adb45b19-6bc6-4f21-c97a-ba2af2ad2ae5 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f0d18a400, cur 1564010022 expire 1564009872 last 1564009795 Jul 24 16:13:42 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 16:14:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 16:14:37 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 24 16:18:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 16:18:31 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 24 16:18:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 16:18:53 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 16:22:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 16:24:46 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 24 16:26:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:28:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:29:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 16:29:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 16:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 16:29:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 16:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 16:34:58 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 24 16:39:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 16:39:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 24 16:42:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 16:42:36 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 16:44:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:45:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 16:45:03 fir-md1-s1 kernel: Lustre: Skipped 57521 previous similar messages Jul 24 16:46:44 fir-md1-s1 kernel: LustreError: 46579:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bd4da2c50 x1631353448867824/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:19/0 lens 488/448 e 0 to 0 dl 1564012009 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 16:46:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 16:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 16:50:02 fir-md1-s1 kernel: Lustre: Skipped 57509 previous similar messages Jul 24 16:53:02 fir-md1-s1 kernel: LustreError: 22649:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1690078c50 x1631353448913552/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1564012411 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 16:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 16:53:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 16:53:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 16:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 16:55:03 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 24 16:55:26 fir-md1-s1 kernel: Lustre: 46558:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1bd4da6850 x1631353448931392/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1564012531 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 16:55:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 16:55:31 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f1bd4da6850 x1631353448931392/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1564012531 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 16:55:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 16:56:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 17:00:23 fir-md1-s1 kernel: Lustre: Skipped 48203 previous similar messages Jul 24 17:02:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:03:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:03:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 17:03:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 17:05:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 17:05:19 fir-md1-s1 kernel: Lustre: Skipped 48201 previous similar messages Jul 24 17:11:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 17:11:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 24 17:11:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:12:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:13:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:14:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 17:14:23 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 17:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 17:15:24 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 24 17:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 17:21:40 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 17:23:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 17:25:36 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 24 17:26:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 17:26:11 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 24 17:26:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 17:31:49 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 24 17:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 17:35:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:35:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 17:35:53 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 24 17:36:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 17:36:51 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 17:38:36 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015109/real 1564015109] req@ffff8f243608b000 x1636745416869648/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015116 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:39:05 fir-md1-s1 kernel: Lustre: 21678:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015138/real 1564015138] req@ffff8f436a52b900 x1636745417088880/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015145 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:39:46 fir-md1-s1 kernel: Lustre: 21332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015179/real 1564015179] req@ffff8f293bba8300 x1636745417487184/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015186 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:40:15 fir-md1-s1 kernel: Lustre: 21459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015208/real 1564015208] req@ffff8f1e8bfe4200 x1636745417761680/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015215 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:40:37 fir-md1-s1 kernel: Lustre: 23718:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015230/real 1564015230] req@ffff8f319aae1200 x1636745418014464/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015237 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:41:26 fir-md1-s1 kernel: Lustre: 23732:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015279/real 1564015279] req@ffff8f3db4617500 x1636745418548928/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015286 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:42:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 17:42:06 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 24 17:43:00 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564015373/real 1564015373] req@ffff8f17671af800 x1636745419234944/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564015380 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 24 17:43:00 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 24 17:43:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 17:43:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 24 17:43:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c138873-fb76-0026-68e2-057fa6923b0f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27dae2d000, cur 1564015393 expire 1564015243 last 1564015166 Jul 24 17:43:13 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 24 17:46:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 17:46:02 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 24 17:50:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 17:50:11 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 17:52:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 17:52:22 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 24 17:56:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 619e7d93-7664-eb46-de87-f756c43aa2ee (at 10.9.109.20@o2ib4) Jul 24 17:56:07 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 24 18:00:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:00:15 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 24 18:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 18:02:24 fir-md1-s1 kernel: Lustre: Skipped 12269 previous similar messages Jul 24 18:03:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 18:03:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 18:04:55 fir-md1-s1 kernel: Lustre: 23720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2692e64b00 x1635348607275808/t0(0) o101->c1c54f8a-db68-72ea-1f4f-3dc905e7ab7d@10.8.1.16@o2ib6:0/0 lens 480/568 e 0 to 0 dl 1564016700 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 18:05:05 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f17e0360f00 x1636424858283520/t0(0) o101->304180e1-aa68-a4a4-ed4c-9536f53351a5@10.8.1.21@o2ib6:10/0 lens 480/568 e 0 to 0 dl 1564016710 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 18:05:09 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.1.20@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ba636fbc0/0x5d9ee6834c3c25cc lrc: 3/0,0 mode: PW/PW res: [0x20002891d:0x2:0x0].0x0 bits 0x40/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.8.1.20@o2ib6 remote: 0x5e4237d029d4ec59 expref: 24 pid: 23720 timeout: 3131769 lvb_type: 0 Jul 24 18:05:10 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.1.20@o2ib6 arrived at 1564016710 with bad export cookie 6746082289093310153 Jul 24 18:05:10 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 483 previous similar messages Jul 24 18:05:11 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 24 18:05:11 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 24 18:05:37 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3372393f00 x1636422400165680/t0(0) o101->b5d37fef-ba24-e714-aa45-15692218e88e@10.8.1.20@o2ib6:12/0 lens 480/568 e 0 to 0 dl 1564016742 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 18:05:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.1.16@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2b7e6418c0/0x5d9ee6834c3c56da lrc: 3/0,0 mode: PW/PW res: [0x20002891d:0x2:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.1.16@o2ib6 remote: 0x5be30ca19dfc3bc2 expref: 25 pid: 23616 timeout: 3131799 lvb_type: 0 Jul 24 18:06:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 18:06:23 fir-md1-s1 kernel: Lustre: Skipped 12282 previous similar messages Jul 24 18:12:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:12:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 18:12:37 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 24 18:12:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 18:12:38 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 18:16:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 18:16:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 18:16:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 18:16:38 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 24 18:22:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 18:22:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 18:22:53 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 24 18:24:30 fir-md1-s1 kernel: Lustre: 46547:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bd4da5050 x1631353450570368/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:5/0 lens 504/448 e 1 to 0 dl 1564017875 ref 2 fl Interpret:/0/0 rc 0/0 Jul 24 18:24:35 fir-md1-s1 kernel: LustreError: 21541:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1bd4da5050 x1631353450570368/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:5/0 lens 504/448 e 1 to 0 dl 1564017875 ref 1 fl Interpret:/0/0 rc 0/0 Jul 24 18:24:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 24 18:26:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:26:11 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 18:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7d25b8d3-7faa-429c-fc44-b23ba26e43fa (at 10.8.17.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3539437000, cur 1564018006 expire 1564017856 last 1564017779 Jul 24 18:26:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 24 18:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 18:27:10 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 24 18:32:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 18:32:55 fir-md1-s1 kernel: Lustre: Skipped 88466 previous similar messages Jul 24 18:33:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 18:33:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 18:36:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:36:43 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 18:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 18:37:15 fir-md1-s1 kernel: Lustre: Skipped 88488 previous similar messages Jul 24 18:43:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 18:43:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 24 18:47:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 18:47:18 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 24 18:47:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:47:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 18:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 18:53:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 18:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 18:57:18 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 24 18:59:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 18:59:36 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 24 19:03:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 19:03:34 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 24 19:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 19:07:21 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 24 19:09:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.106.69@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:09:17 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 19:10:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 19:10:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 24 19:10:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.26@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:10:32 fir-md1-s1 kernel: LustreError: Skipped 973 previous similar messages Jul 24 19:13:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:13:20 fir-md1-s1 kernel: LustreError: Skipped 487 previous similar messages Jul 24 19:14:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 19:14:22 fir-md1-s1 kernel: Lustre: Skipped 21999 previous similar messages Jul 24 19:18:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 19:18:01 fir-md1-s1 kernel: Lustre: Skipped 22008 previous similar messages Jul 24 19:18:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:18:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 24 19:23:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 19:23:56 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 24 19:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 19:25:31 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 24 19:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 19:28:25 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 24 19:29:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:29:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 19:34:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 19:34:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 24 19:35:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 19:35:41 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 24 19:38:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 24 19:38:26 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 24 19:40:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:40:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 19:44:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 19:44:32 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 19:45:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 19:45:59 fir-md1-s1 kernel: Lustre: Skipped 13405 previous similar messages Jul 24 19:48:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 19:48:49 fir-md1-s1 kernel: Lustre: Skipped 13423 previous similar messages Jul 24 19:52:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26a8b40400, cur 1564023120 expire 1564022970 last 1564022893 Jul 24 19:52:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 24 19:56:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 24 19:56:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 24 19:56:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 19:56:17 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 24 19:56:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 19:56:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 19:59:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 19:59:06 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 24 20:04:14 fir-md1-s1 kernel: perf: interrupt took too long (4965 > 4936), lowering kernel.perf_event_max_sample_rate to 40000 Jul 24 20:06:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 24 20:06:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 20:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 20:06:29 fir-md1-s1 kernel: Lustre: Skipped 127478 previous similar messages Jul 24 20:06:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 20:06:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 20:09:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 20:09:27 fir-md1-s1 kernel: Lustre: Skipped 127498 previous similar messages Jul 24 20:12:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d66c9f000, cur 1564024331 expire 1564024181 last 1564024104 Jul 24 20:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 20:16:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 24 20:17:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 20:17:00 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 24 20:17:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 20:17:28 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 20:19:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 20:19:36 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 24 20:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f158651c800, cur 1564025119 expire 1564024969 last 1564024892 Jul 24 20:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 20:26:41 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 20:27:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 20:27:06 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 20:29:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 20:29:44 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 24 20:33:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 20:33:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 20:36:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 24 20:36:57 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 20:37:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 20:37:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 20:40:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 20:40:08 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 24 20:47:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 20:47:03 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 20:47:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 20:47:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 24 20:48:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 24 20:48:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 24 20:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 20:50:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 24 20:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 20:57:24 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 20:59:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 20:59:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 21:00:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:00:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 21:00:33 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 24 21:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 21:07:38 fir-md1-s1 kernel: Lustre: Skipped 101977 previous similar messages Jul 24 21:10:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 21:10:39 fir-md1-s1 kernel: Lustre: Skipped 102008 previous similar messages Jul 24 21:12:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 21:12:16 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 21:16:22 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 24 21:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 24 21:18:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 21:20:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:20:17 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 21:20:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 21:20:58 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 24 21:22:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 21:22:44 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 21:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 21:29:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 24 21:30:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 21:31:21 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 21:34:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 21:34:38 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 24 21:39:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 21:39:26 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 21:41:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 21:41:24 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 24 21:45:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 21:45:07 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 24 21:47:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:47:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 21:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 21:49:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 24 21:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 21:51:43 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 24 21:56:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 21:56:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 21:59:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 21:59:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 24 22:00:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 22:00:43 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 24 22:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 22:01:49 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 24 22:07:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:07:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 24 22:11:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 22:11:26 fir-md1-s1 kernel: Lustre: Skipped 5947 previous similar messages Jul 24 22:11:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 22:11:49 fir-md1-s1 kernel: Lustre: Skipped 5970 previous similar messages Jul 24 22:18:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:18:20 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 24 22:19:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 22:19:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 22:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 22:21:58 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 22:21:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 24 22:21:58 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 22:28:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:28:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 24 22:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 22:32:06 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 22:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 22:32:06 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 24 22:38:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:38:54 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 24 22:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 22:42:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 22:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 22:42:20 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 22:42:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 22:42:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 22:44:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 22:44:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 22:48:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:48:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 24 22:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 22:52:25 fir-md1-s1 kernel: Lustre: Skipped 39298 previous similar messages Jul 24 22:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 22:52:25 fir-md1-s1 kernel: Lustre: Skipped 39305 previous similar messages Jul 24 22:52:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 22:59:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 22:59:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 24 23:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 23:02:34 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 24 23:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 23:02:34 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 24 23:09:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 23:09:19 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 24 23:12:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 23:12:36 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 24 23:13:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 24 23:13:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 24 23:19:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:19:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 23:19:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 23:19:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 24 23:20:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:22:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 24 23:22:38 fir-md1-s1 kernel: Lustre: Skipped 28192 previous similar messages Jul 24 23:23:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:23:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 24 23:23:29 fir-md1-s1 kernel: Lustre: Skipped 28165 previous similar messages Jul 24 23:26:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fbaa68800, cur 1564035973 expire 1564035823 last 1564035746 Jul 24 23:26:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:29:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 23:29:55 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 24 23:33:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 23:33:10 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 23:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 23:33:36 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 24 23:36:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26 (at 10.8.16.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2504e7e400, cur 1564036607 expire 1564036457 last 1564036380 Jul 24 23:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26 (at 10.8.16.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efef8400, cur 1564036621 expire 1564036471 last 1564036394 Jul 24 23:37:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 24 23:40:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 24 23:40:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 24 23:40:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:40:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 24 23:43:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 24 23:43:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 24 23:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 23:44:09 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 24 23:51:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 24 23:51:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 24 23:52:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 24 23:52:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 24 23:53:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 24 23:53:24 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 24 23:54:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 24 23:54:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 24 23:56:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdb66000, cur 1564037800 expire 1564037650 last 1564037573 Jul 25 00:01:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 00:01:17 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 25 00:03:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 00:03:42 fir-md1-s1 kernel: Lustre: Skipped 8537 previous similar messages Jul 25 00:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 00:05:14 fir-md1-s1 kernel: Lustre: Skipped 8509 previous similar messages Jul 25 00:11:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 00:11:40 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 25 00:13:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 00:13:44 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 25 00:15:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 00:15:40 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 00:17:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 00:17:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 00:18:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 00:21:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 00:21:40 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 25 00:23:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 00:23:54 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 25 00:25:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 00:25:42 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 00:30:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 00:30:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 00:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 00:34:00 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 00:35:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 00:35:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 25 00:35:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 00:35:56 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 25 00:40:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 00:44:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 00:44:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 00:45:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 00:45:57 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 25 00:46:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 00:46:24 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 25 00:54:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 00:54:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 00:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 00:56:00 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 00:57:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 00:57:16 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 25 01:01:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 01:04:32 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 01:06:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 01:06:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 25 01:08:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 25 01:08:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 01:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:14:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 01:14:37 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 25 01:15:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:15:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 01:16:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 01:16:42 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 01:18:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 01:18:49 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 01:23:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:23:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 01:24:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 01:24:37 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 25 01:26:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 01:26:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 25 01:28:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 01:28:56 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 25 01:34:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:34:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 01:35:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 01:35:08 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 01:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 01:37:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 01:38:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 01:38:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 01:45:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:45:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 01:45:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 01:45:25 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 25 01:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 01:47:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 01:49:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 01:49:12 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 25 01:55:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 01:55:32 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 01:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 01:57:36 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 01:57:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 01:57:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 01:59:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 01:59:54 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 02:05:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 02:05:36 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 25 02:06:10 fir-md1-s1 kernel: Lustre: 10148:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f14fbf800 x1634182782033648/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:15/0 lens 480/568 e 0 to 0 dl 1564045575 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:07:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 02:07:46 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 02:07:50 fir-md1-s1 kernel: Lustre: 97638:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f167a669200 x1634182782968912/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:25/0 lens 480/568 e 0 to 0 dl 1564045675 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:09:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 02:09:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 02:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 02:10:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 02:15:16 fir-md1-s1 kernel: Lustre: 23615:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f11a49bc200 x1634182787771200/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:21/0 lens 480/568 e 0 to 0 dl 1564046121 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 02:15:39 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 25 02:16:29 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f1aa0c62d00 x1634182788346208/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:4/0 lens 480/568 e 0 to 0 dl 1564046194 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:17:28 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f22cefe1500 x1634182788802192/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:3/0 lens 480/568 e 0 to 0 dl 1564046253 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 02:17:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 02:20:16 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f2507971200 x1634182790309296/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:21/0 lens 480/568 e 0 to 0 dl 1564046421 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:20:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 02:20:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 02:21:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 02:21:46 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 25 02:23:15 fir-md1-s1 kernel: Lustre: 23667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3ea4f35100 x1634182791676720/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:20/0 lens 480/568 e 0 to 0 dl 1564046600 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:23:27 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:23:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:25:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 02:25:57 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 25 02:27:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 02:27:51 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 25 02:31:41 fir-md1-s1 kernel: Lustre: 21127:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34e07ee300 x1634182795687872/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:16/0 lens 480/568 e 1 to 0 dl 1564047106 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:31:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 02:31:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 02:32:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 02:32:39 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 02:33:16 fir-md1-s1 kernel: Lustre: 23718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4076fbb000 x1634182796493312/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:21/0 lens 488/568 e 1 to 0 dl 1564047201 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:33:16 fir-md1-s1 kernel: Lustre: 23718:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 25 02:36:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 02:36:02 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 25 02:38:19 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:38:30 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:38:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 02:38:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 02:38:44 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:39:06 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:39:06 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 25 02:39:26 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:39:26 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 25 02:39:43 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:40:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 02:40:15 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 25 02:41:44 fir-md1-s1 kernel: Lustre: 50576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f39a42b3f00 x1634182803009728/t0(0) o101->185d31e3-2aa7-c8dc-f4ab-116af2588723@10.9.109.14@o2ib4:19/0 lens 480/568 e 0 to 0 dl 1564047709 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 02:41:44 fir-md1-s1 kernel: Lustre: 50576:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 25 02:42:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 02:42:43 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 25 02:43:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 02:43:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 02:46:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 02:46:58 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 25 02:48:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 02:48:37 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 02:54:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 25 02:54:22 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 25 02:54:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 02:54:43 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 02:57:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 02:57:09 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 25 02:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 02:59:24 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 25 03:07:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 03:07:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 03:07:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 03:07:15 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 03:08:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:08:17 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 03:10:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 03:10:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 25 03:17:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 03:17:26 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 03:17:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 25 03:17:54 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 03:20:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:20:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 03:20:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 03:20:27 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 25 03:27:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 03:27:58 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 03:30:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 03:30:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 25 03:30:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 03:30:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:30:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 03:30:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 03:37:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b3ffbf800, cur 1564051065 expire 1564050915 last 1564050838 Jul 25 03:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 03:38:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 03:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 03:41:12 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 25 03:41:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 03:41:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 03:48:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 03:48:23 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 25 03:50:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:50:54 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 03:51:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 03:51:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 03:51:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 03:51:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 25 03:53:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:53:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 03:56:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 03:58:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 03:58:24 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 25 04:01:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 04:01:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 04:01:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 04:01:48 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 04:04:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:04:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 04:08:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:08:46 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 25 04:12:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 25 04:12:33 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 25 04:13:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 04:13:40 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 04:14:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:14:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 04:18:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:18:49 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 04:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 04:23:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 25 04:24:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:24:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 04:27:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 04:27:28 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 25 04:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3684a75400, cur 1564054098 expire 1564053948 last 1564053871 Jul 25 04:29:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:29:02 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 04:33:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 04:33:42 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 25 04:34:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:34:56 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 04:38:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 04:38:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 04:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:39:09 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 25 04:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 04:44:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 04:45:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:45:51 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 04:48:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 04:48:05 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 25 04:49:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:49:16 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 25 04:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 04:54:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 04:56:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 04:56:59 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 04:59:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 04:59:18 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 25 04:59:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 04:59:20 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 05:04:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 05:04:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 05:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 05:09:20 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 25 05:09:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 05:09:56 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 05:10:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 05:10:22 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 05:16:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 05:16:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 05:19:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 05:19:50 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 25 05:20:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 05:20:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 25 05:22:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 05:22:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 25 05:26:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 05:26:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 25 05:29:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 05:29:55 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 05:31:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 05:31:07 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 05:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 05:37:10 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 05:40:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 05:40:01 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 05:40:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 05:40:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 05:41:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 05:41:16 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 05:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 05:47:24 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 25 05:50:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 05:50:01 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 25 05:53:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 05:53:08 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 25 05:55:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 05:55:52 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 05:57:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 05:57:27 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 06:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 06:00:02 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 06:03:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 06:03:33 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 25 06:06:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 06:06:16 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 06:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 06:07:34 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 06:10:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 06:10:02 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 25 06:12:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2871c8a000, cur 1564060352 expire 1564060202 last 1564060125 Jul 25 06:13:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 06:13:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 06:17:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 06:17:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 06:18:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 06:18:04 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 25 06:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 06:20:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 25 06:23:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 06:23:40 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 25 06:27:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 06:27:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 06:28:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 06:28:23 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 06:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 06:31:40 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 06:36:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 06:36:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 06:37:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 06:37:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 25 06:38:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 06:38:33 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 06:41:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 06:41:55 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 25 06:47:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 06:47:15 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 25 06:48:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 06:48:01 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 06:48:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f399b232800, cur 1564062519 expire 1564062369 last 1564062292 Jul 25 06:51:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 06:51:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 06:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 06:52:05 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 06:57:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 06:57:18 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 06:58:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 06:58:14 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 07:02:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 07:02:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 07:03:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:03:08 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 07:07:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 07:07:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 25 07:08:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 07:08:19 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 07:12:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 07:12:32 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 07:14:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:14:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 07:17:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 07:17:52 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 25 07:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 07:18:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 07:22:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 07:22:32 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 25 07:24:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:24:20 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 07:27:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 07:27:52 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 25 07:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 07:28:56 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 07:32:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 07:32:37 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 25 07:35:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:35:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 07:38:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 07:38:23 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 07:38:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 07:38:57 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 07:43:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 07:43:19 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 25 07:45:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:45:52 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 07:48:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 07:48:55 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 07:49:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 07:49:17 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 07:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 07:53:25 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 07:56:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 07:56:52 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 07:59:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 07:59:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 07:59:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 07:59:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 08:02:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b8c206000, cur 1564066923 expire 1564066773 last 1564066696 Jul 25 08:03:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 08:03:26 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 25 08:07:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 08:07:05 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 08:10:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 08:10:34 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 08:10:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 08:10:47 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 08:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 08:13:45 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 08:16:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45292d6800, cur 1564067766 expire 1564067616 last 1564067539 Jul 25 08:18:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 08:18:35 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 08:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec32bc800, cur 1564067962 expire 1564067812 last 1564067735 Jul 25 08:20:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 08:20:49 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 08:20:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 08:20:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 08:23:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 08:23:49 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 25 08:29:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 08:29:55 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 08:30:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 08:30:53 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 08:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 08:31:04 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 08:33:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 08:33:51 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 25 08:41:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 08:41:27 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 08:41:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 08:41:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 08:43:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 08:43:43 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 25 08:43:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 08:43:52 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 25 08:51:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 08:51:37 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 25 08:52:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 08:52:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 08:54:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 08:54:25 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 25 08:55:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 08:55:58 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 09:01:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3fa422ac00, cur 1564070494 expire 1564070344 last 1564070267 Jul 25 09:02:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 09:02:04 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 09:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 09:02:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 09:04:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 09:04:45 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 09:07:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 09:07:51 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 09:12:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 09:12:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 09:13:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 09:13:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 09:15:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 09:15:25 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 25 09:18:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 09:18:11 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 09:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 09:23:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 09:23:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 09:23:47 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 25 09:25:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 09:25:30 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 25 09:31:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 09:31:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 09:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 09:33:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 09:33:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 25 09:33:50 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 09:35:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 09:35:38 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 09:42:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 09:42:33 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 25 09:43:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 09:43:56 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 09:44:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c6520fc00, cur 1564073070 expire 1564072920 last 1564072843 Jul 25 09:45:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 09:45:23 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 09:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f340f8b9800, cur 1564073127 expire 1564072977 last 1564072900 Jul 25 09:45:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 09:45:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 09:52:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 09:52:49 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 09:54:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 09:54:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 09:55:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 09:55:27 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 25 09:55:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 09:55:48 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 25 10:03:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:03:35 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 10:04:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 10:04:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 10:05:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 10:05:33 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 25 10:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 10:05:54 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 25 10:14:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:14:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 10:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 10:14:32 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 25 10:15:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 10:15:59 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 10:18:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 10:18:23 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 25 10:24:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:24:50 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 25 10:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 10:25:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 10:26:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 10:26:02 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 25 10:28:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 10:28:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 10:35:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 10:35:13 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 10:35:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:35:46 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 10:37:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 10:37:28 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 25 10:39:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 10:39:59 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 10:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 10:45:23 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 10:47:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 10:47:29 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 25 10:48:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:48:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 10:52:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 10:52:09 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 25 10:55:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 10:55:24 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 10:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 10:57:46 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 25 10:58:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 10:58:53 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 11:03:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 11:03:25 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 25 11:05:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 11:05:33 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 25 11:07:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 11:07:47 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 11:09:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 11:09:09 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 11:12:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2802b18c00, cur 1564078321 expire 1564078171 last 1564078094 Jul 25 11:13:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 11:13:36 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 25 11:14:32 fir-md1-s1 kernel: Lustre: 23741:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26fae14e00 x1631353465256128/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 480/568 e 1 to 0 dl 1564078477 ref 2 fl Interpret:/0/0 rc 0/0 Jul 25 11:14:32 fir-md1-s1 kernel: Lustre: 23741:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 25 11:14:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f9d9a33c0/0x5d9ee685438197ce lrc: 3/0,0 mode: PR/PR res: [0x2c002c662:0x2e:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2267de8f5 expref: 42 pid: 23588 timeout: 3193546 lvb_type: 0 Jul 25 11:14:46 fir-md1-s1 kernel: LustreError: 23588:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f17ce68c800 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f9d9a5340/0x5d9ee6854381986f lrc: 3/0,0 mode: PW/PW res: [0x2c002c662:0x2e:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2267de903 expref: 23 pid: 23588 timeout: 0 lvb_type: 0 Jul 25 11:14:46 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f26fae14e00 x1631353465256128/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 480/536 e 1 to 0 dl 1564078477 ref 1 fl Complete:/0/0 rc -107/-107 Jul 25 11:15:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 11:15:34 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 11:17:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 11:17:49 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 11:19:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 11:19:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 11:23:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 11:23:38 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 11:26:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 11:26:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 11:27:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 11:27:54 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 25 11:32:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 11:32:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 11:33:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 11:33:43 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 11:36:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 11:36:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 11:37:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 11:37:59 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 25 11:42:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 11:42:41 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 11:44:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 11:44:38 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 25 11:47:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 11:47:02 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 11:48:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 11:48:05 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 25 11:51:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b1522cc00, cur 1564080695 expire 1564080545 last 1564080468 Jul 25 11:53:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 11:53:01 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 11:55:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 11:55:17 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 25 11:57:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 11:57:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 11:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 11:58:34 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 12:03:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 12:03:56 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 12:05:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 12:05:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 12:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 12:07:34 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 12:08:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 12:08:37 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 25 12:15:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 12:15:18 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 12:16:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 12:16:41 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 25 12:18:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 12:18:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 12:18:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 12:18:39 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 25 12:24:52 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 12:25:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 12:25:42 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 12:25:49 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 12:25:49 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 25 12:26:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 12:27:20 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 25 12:28:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 12:28:26 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 25 12:28:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 12:28:27 fir-md1-s1 kernel: Lustre: Skipped 83682 previous similar messages Jul 25 12:28:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 12:28:48 fir-md1-s1 kernel: Lustre: Skipped 83723 previous similar messages Jul 25 12:37:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 12:37:35 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 12:38:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 12:38:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 12:39:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 12:39:00 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 12:40:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 12:40:42 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 12:48:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 12:48:49 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 12:49:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 12:49:02 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 25 12:49:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 12:49:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 12:53:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 12:53:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 12:59:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 12:59:26 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 25 12:59:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 12:59:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 13:01:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:01:29 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 13:03:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 13:03:03 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 13:09:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 13:09:27 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 25 13:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 13:09:42 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 25 13:11:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:11:32 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 13:13:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 25 13:13:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 13:19:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 13:19:27 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 13:19:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 13:19:53 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 13:21:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:21:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 13:24:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 13:24:21 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 25 13:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 13:29:29 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 25 13:30:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 13:30:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 13:34:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 13:34:39 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 25 13:35:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:35:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 13:39:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 13:39:30 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 13:40:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 13:40:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 13:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f208132e400, cur 1564087270 expire 1564087120 last 1564087043 Jul 25 13:47:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 25 13:47:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 13:47:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f198436a000, cur 1564087647 expire 1564087497 last 1564087420 Jul 25 13:47:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:47:41 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 13:49:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 13:49:56 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 13:51:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 13:51:12 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 13:59:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 13:59:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 13:59:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 13:59:26 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 14:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 14:00:05 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 25 14:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 14:01:16 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 14:09:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 14:09:39 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 14:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 14:10:14 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 14:10:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 14:10:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 14:11:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 14:11:26 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 14:20:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 14:20:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 14:20:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 14:20:39 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 25 14:20:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 14:20:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 14:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 14:21:41 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 25 14:29:52 fir-md1-s1 kernel: Lustre: 20541:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564090185/real 1564090185] req@ffff8f0cdeb93300 x1636746037061856/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564090192 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 14:30:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 14:30:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 14:30:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 14:30:49 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 25 14:32:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 14:32:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 14:32:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 14:32:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 14:40:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 14:40:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 14:41:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 14:41:46 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 25 14:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 14:43:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 14:44:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 14:44:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 14:49:57 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 14:49:57 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 544 previous similar messages Jul 25 14:51:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 14:51:42 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 25 14:51:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 14:51:53 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 14:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 14:53:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 14:53:35 fir-md1-s1 kernel: Lustre: 20457:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564091608/real 1564091608] req@ffff8f0939553f00 x1636746076365136/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564091615 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 14:54:13 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 14:54:13 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 25 14:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 14:54:53 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 15:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 15:01:55 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 25 15:03:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:03:12 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 15:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 15:03:13 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 15:06:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 15:06:21 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 15:10:59 fir-md1-s1 kernel: Lustre: 20457:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 15:10:59 fir-md1-s1 kernel: Lustre: 20457:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Jul 25 15:11:01 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 15:11:01 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jul 25 15:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 15:11:58 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 25 15:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 15:13:25 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 15:14:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:14:07 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 25 15:15:11 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 15:16:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 15:16:36 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 25 15:17:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a0e9ca35-ffdd-0487-a0d6-eb22eb9bb125 (at 10.8.27.27@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19cfd91000, cur 1564093049 expire 1564092899 last 1564092822 Jul 25 15:17:43 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 15:17:43 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Jul 25 15:18:47 fir-md1-s1 kernel: Lustre: 23685:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564093120/real 1564093120] req@ffff8f2601478300 x1636746100393920/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564093127 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 15:22:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 15:22:07 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 25 15:23:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 15:23:35 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 15:24:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:24:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 15:26:35 fir-md1-s1 kernel: Lustre: 23602:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564093588/real 1564093588] req@ffff8f136e276c00 x1636746107864416/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564093595 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 15:26:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 15:26:42 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 15:28:12 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:28:52 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:28:52 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 25 15:30:53 fir-md1-s1 kernel: Lustre: 23593:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 15:30:53 fir-md1-s1 kernel: Lustre: 23593:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jul 25 15:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 15:32:19 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 25 15:32:36 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:32:36 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 25 15:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 15:34:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 15:34:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:34:39 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 15:36:21 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:37:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 15:37:32 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 15:41:04 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:41:04 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Jul 25 15:42:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 15:42:46 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 25 15:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 15:44:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 15:46:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:46:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 15:47:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 15:47:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 15:49:47 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 15:49:47 fir-md1-s1 kernel: Lustre: 23672:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jul 25 15:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 15:52:58 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 25 15:55:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 15:55:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 15:56:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 15:56:31 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 15:57:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 15:57:49 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 16:02:36 fir-md1-s1 kernel: Lustre: 21669:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564095749/real 1564095749] req@ffff8f13ca163900 x1636746142328096/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564095756 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 16:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 16:03:04 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 16:06:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 16:06:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 16:07:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 16:07:20 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 16:08:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 16:08:18 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 25 16:13:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 16:13:07 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 25 16:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a03c9cc00, cur 1564096517 expire 1564096367 last 1564096290 Jul 25 16:15:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 25 16:18:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 16:18:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 16:18:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 16:18:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 16:19:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 16:19:14 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 16:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 16:23:21 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 25 16:26:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bd9289800, cur 1564097195 expire 1564097045 last 1564096968 Jul 25 16:28:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 16:28:38 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 16:30:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 16:30:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 16:33:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 16:33:22 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 16:34:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 16:34:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 16:39:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 16:39:01 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 25 16:43:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 16:43:05 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 25 16:43:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 16:43:28 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 25 16:45:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 16:45:10 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 16:45:27 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:45:27 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 25 16:45:32 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:45:42 fir-md1-s1 kernel: Lustre: 23602:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:46:33 fir-md1-s1 kernel: Lustre: 23557:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:47:22 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:47:22 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Jul 25 16:48:48 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:48:48 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jul 25 16:49:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 16:49:04 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 16:51:31 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 16:53:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 16:53:46 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 16:53:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 16:53:46 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 16:56:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 16:56:16 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 16:59:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 16:59:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 17:00:13 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 17:00:13 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Jul 25 17:03:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 17:03:52 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 25 17:03:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 17:03:52 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 25 17:06:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 17:06:40 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 17:09:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 17:09:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 25 17:13:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 17:13:57 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 17:13:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 17:13:57 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 25 17:17:56 fir-md1-s1 kernel: Lustre: 20983:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 17:18:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 17:18:28 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 17:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c62d3a800, cur 1564100351 expire 1564100201 last 1564100124 Jul 25 17:20:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 17:20:45 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 25 17:24:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 17:24:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 17:24:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 17:24:03 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 25 17:28:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 17:28:48 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 17:30:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 17:30:52 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 17:34:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 17:34:10 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 25 17:34:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 17:34:38 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 25 17:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b6a653a0-fffa-17d4-6340-7f18956e24df (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21769d6800, cur 1564101448 expire 1564101298 last 1564101221 Jul 25 17:38:53 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 17:38:53 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Jul 25 17:39:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 17:39:07 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 17:40:15 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 17:40:15 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 25 17:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 17:41:03 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 17:44:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 17:44:17 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 25 17:44:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 17:44:40 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 17:49:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 17:49:23 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 17:50:52 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 17:50:52 fir-md1-s1 kernel: Lustre: 23569:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Jul 25 17:51:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 17:51:22 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 17:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 17:54:31 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 25 17:55:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 25 17:55:24 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 17:57:54 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 17:57:54 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Jul 25 18:00:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 18:00:43 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 18:02:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 18:02:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 18:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 18:04:51 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 25 18:05:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 18:05:25 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 18:08:30 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 18:08:30 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 77 previous similar messages Jul 25 18:11:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 18:11:31 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 25 18:12:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 18:12:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 18:15:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 18:15:13 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 18:18:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 18:18:22 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 18:20:34 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 18:20:34 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Jul 25 18:22:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 18:22:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 18:23:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 18:23:04 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 18:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 18:25:18 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 25 18:29:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 18:29:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 25 18:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 18:32:36 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 25 18:35:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 18:35:18 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 25 18:38:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 18:38:07 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 18:39:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 18:39:41 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 18:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 18:42:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 18:45:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 18:45:59 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 25 18:49:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 18:49:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 18:51:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 18:51:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 18:51:39 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 18:51:39 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Jul 25 18:53:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 18:53:53 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 18:56:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 18:56:03 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 19:00:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:00:15 fir-md1-s1 kernel: LustreError: Skipped 13 previous similar messages Jul 25 19:02:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 19:02:14 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 25 19:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 19:04:06 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 25 19:06:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 19:06:23 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 25 19:08:15 fir-md1-s1 kernel: Lustre: 21669:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 19:11:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:11:38 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 19:12:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 19:12:21 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 25 19:14:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 19:14:11 fir-md1-s1 kernel: Lustre: Skipped 15989 previous similar messages Jul 25 19:16:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 19:16:38 fir-md1-s1 kernel: Lustre: Skipped 16046 previous similar messages Jul 25 19:21:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:21:54 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 19:22:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 19:22:23 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 25 19:24:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 19:24:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 19:26:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 19:26:44 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 25 19:32:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:32:28 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 19:34:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 19:34:43 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 19:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 19:37:01 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 25 19:38:53 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 19:38:53 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jul 25 19:40:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 19:40:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 25 19:42:03 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 19:42:03 fir-md1-s1 kernel: Lustre: 21410:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Jul 25 19:42:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:42:37 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 19:45:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 19:45:01 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 19:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 19:47:01 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 19:51:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 19:51:14 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 25 19:53:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 19:53:04 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 25 19:55:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 19:55:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 25 19:57:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 19:57:03 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 25 20:01:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 20:01:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 25 20:03:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:03:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 20:05:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 20:05:30 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 25 20:07:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 20:07:21 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 20:11:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 25 20:11:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 25 20:13:20 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 20:13:20 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Jul 25 20:14:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:14:14 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 20:16:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 20:16:11 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 20:17:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 20:17:40 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 25 20:21:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 20:21:39 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 25 20:24:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:24:21 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 20:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 20:26:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 20:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 20:27:40 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 25 20:31:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 20:31:50 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 20:35:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:35:36 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 20:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 20:36:48 fir-md1-s1 kernel: Lustre: Skipped 22126 previous similar messages Jul 25 20:37:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 20:37:49 fir-md1-s1 kernel: Lustre: Skipped 22180 previous similar messages Jul 25 20:42:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 20:42:05 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 25 20:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 20:47:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 20:48:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:48:14 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 20:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 20:48:16 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 20:52:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 20:52:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 25 20:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 20:57:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 20:58:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 20:58:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 25 20:58:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 20:58:31 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 25 20:58:51 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 21:02:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 21:02:59 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 21:03:16 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Jul 25 21:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3de726d400, cur 1564113914 expire 1564113764 last 1564113687 Jul 25 21:05:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 25 21:07:45 fir-md1-s1 kernel: Lustre: 23634:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 21:07:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 21:07:58 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 21:08:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 21:08:32 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 21:08:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 21:08:58 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 21:13:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 21:13:40 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 25 21:18:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 21:18:07 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 25 21:18:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 21:18:34 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 25 21:18:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 21:18:59 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 25 21:23:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 21:23:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 25 21:28:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 21:28:35 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 25 21:28:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 21:28:36 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 21:31:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 21:31:27 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 25 21:33:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 21:33:50 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 25 21:38:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 21:38:36 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 25 21:38:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 21:38:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 21:42:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 21:42:33 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 25 21:43:46 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564116219/real 1564116219] req@ffff8f0bf9d73300 x1636746442042016/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564116226 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 21:43:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 21:43:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 21:48:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 25 21:48:39 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 25 21:49:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 21:49:03 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 25 21:53:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 21:53:22 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 21:54:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 21:54:41 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 21:58:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 21:58:55 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 25 21:59:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 21:59:16 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 22:04:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 22:04:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 25 22:05:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 22:05:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 22:09:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 22:09:00 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 25 22:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 22:09:20 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 25 22:17:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 22:17:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 22:17:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 22:17:54 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 25 22:19:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 22:19:05 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 25 22:20:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 22:20:14 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 22:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 22:29:07 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 25 22:30:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 25 22:30:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 25 22:30:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 22:30:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 25 22:36:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 22:36:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 25 22:39:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 22:39:09 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 25 22:40:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 22:40:29 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 25 22:40:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 22:40:54 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 25 22:49:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 22:49:10 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 25 22:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 22:50:31 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 22:51:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 22:51:28 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 22:52:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 22:52:33 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 25 22:59:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 22:59:13 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 25 23:00:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 25 23:00:42 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 23:01:35 fir-md1-s1 kernel: Lustre: 10307:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564120888/real 1564120888] req@ffff8f129f3a3300 x1636746491212976/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564120895 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 25 23:01:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 23:01:36 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 25 23:09:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 23:09:15 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 25 23:10:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 23:10:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 25 23:11:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 23:11:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 25 23:19:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 23:19:31 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 25 23:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 23:20:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 25 23:22:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 23:22:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 25 23:29:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 25 23:29:34 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 23:29:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 23:29:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 23:31:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 25 23:31:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 25 23:31:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 23:31:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 23:32:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 23:32:56 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 25 23:36:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 23:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 25 23:39:44 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 25 23:40:51 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 25 23:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 25 23:41:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 25 23:42:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 23:44:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 25 23:44:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 25 23:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 25 23:49:46 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 25 23:52:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 25 23:52:01 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 25 23:53:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 25 23:53:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 25 23:55:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 25 23:55:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 25 23:59:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 25 23:59:53 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 26 00:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 00:02:10 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 00:02:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0f0c871000, cur 1564124531 expire 1564124381 last 1564124304 Jul 26 00:08:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 00:08:02 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 00:08:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 00:08:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 00:09:45 fir-md1-s1 kernel: Lustre: 23575:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564124978/real 1564124978] req@ffff8f0b6290e900 x1636746533434448/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564124985 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 00:10:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 00:10:23 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 26 00:12:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 00:12:13 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 00:18:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 26 00:18:03 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 00:20:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 00:20:31 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 26 00:21:13 fir-md1-s1 kernel: Lustre: 23569:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564125665/real 1564125665] req@ffff8f11c342a100 x1636746542589712/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564125672 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 00:21:16 fir-md1-s1 kernel: Lustre: 23634:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 00:21:16 fir-md1-s1 kernel: Lustre: 23634:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Jul 26 00:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 00:22:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 00:23:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 00:23:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 00:29:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 00:29:08 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 00:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 00:30:32 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 26 00:32:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 00:32:34 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 26 00:33:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 00:33:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 00:39:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 00:39:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 00:40:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 00:40:38 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 26 00:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 00:42:37 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 00:48:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 00:48:57 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 00:51:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 00:51:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 26 00:51:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 00:51:57 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 00:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 00:53:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 00:54:14 fir-md1-s1 kernel: Lustre: 23575:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564127647/real 1564127647] req@ffff8f0aba009e00 x1636746562558032/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564127654 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 01:01:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 01:01:14 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 01:01:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 01:01:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 01:03:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 01:03:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 01:03:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 01:03:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 01:08:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f096ffdf000, cur 1564128524 expire 1564128374 last 1564128297 Jul 26 01:11:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 01:11:26 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 26 01:13:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 01:13:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 01:14:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 01:14:05 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 26 01:14:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 01:14:40 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 01:21:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 01:21:26 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 01:23:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 01:23:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 01:24:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 01:24:36 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 01:26:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 01:31:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 01:31:34 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 26 01:33:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 01:33:30 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 01:35:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 01:35:32 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 01:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 01:41:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 01:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34b7a73400, cur 1564130527 expire 1564130377 last 1564130300 Jul 26 01:42:37 fir-md1-s1 kernel: Lustre: 23691:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 01:42:38 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 01:42:38 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Jul 26 01:42:40 fir-md1-s1 kernel: Lustre: 23593:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 01:42:40 fir-md1-s1 kernel: Lustre: 23593:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 45 previous similar messages Jul 26 01:42:42 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 01:42:42 fir-md1-s1 kernel: Lustre: 21411:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 139 previous similar messages Jul 26 01:43:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 01:43:56 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 01:44:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 01:44:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 01:46:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 01:46:37 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 26 01:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 01:51:59 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 01:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 01:54:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 01:56:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 01:56:39 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 02:02:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 02:02:21 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 02:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 02:04:31 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 02:05:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:05:25 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 02:06:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 02:06:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 02:11:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 02:12:32 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 26 02:14:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:14:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 02:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 02:15:14 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 02:17:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 02:17:14 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 02:22:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 02:22:33 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 26 02:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 02:25:30 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 02:27:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 02:27:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 26 02:32:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 02:32:35 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 02:34:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:34:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 02:35:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 02:35:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:35:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 02:37:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 02:37:35 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 02:42:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 02:42:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 02:42:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 02:42:45 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 26 02:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 02:46:40 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 02:47:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 02:47:42 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 02:53:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 02:53:08 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 02:56:25 fir-md1-s1 kernel: Lustre: 23602:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 02:56:25 fir-md1-s1 kernel: Lustre: 23602:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 76 previous similar messages Jul 26 02:56:39 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 02:56:39 fir-md1-s1 kernel: Lustre: 23651:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Jul 26 02:57:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 02:57:22 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 02:58:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 02:58:13 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 26 03:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 03:03:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 03:07:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 03:07:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 03:08:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 03:08:23 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 03:12:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:13:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 03:13:28 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 03:14:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:16:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:17:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 03:17:46 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 03:18:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 03:18:28 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 03:21:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:21:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 03:23:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 03:23:32 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 26 03:27:02 fir-md1-s1 kernel: Lustre: 10504:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 03:27:02 fir-md1-s1 kernel: Lustre: 10504:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Jul 26 03:27:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 03:27:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 03:28:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 03:28:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:28:59 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 03:33:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 03:33:37 fir-md1-s1 kernel: Lustre: Skipped 132 previous similar messages Jul 26 03:36:11 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 03:36:21 fir-md1-s1 kernel: Lustre: 23569:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564137374/real 1564137374] req@ffff8f13386a8c00 x1636746649656384/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564137381 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 03:37:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 03:37:55 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 03:41:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 26 03:41:27 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 03:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 03:44:00 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 03:45:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:45:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 03:47:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 03:47:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 03:50:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:54:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 03:54:03 fir-md1-s1 kernel: Lustre: Skipped 42723 previous similar messages Jul 26 03:54:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22a0b8c000, cur 1564138467 expire 1564138317 last 1564138240 Jul 26 03:54:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 03:54:52 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 03:55:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 03:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ddea53000, cur 1564138528 expire 1564138378 last 1564138301 Jul 26 03:58:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 03:58:03 fir-md1-s1 kernel: Lustre: Skipped 42708 previous similar messages Jul 26 04:02:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:02:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 04:04:04 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Jul 26 04:04:04 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Jul 26 04:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 04:04:18 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 04:08:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 04:08:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 04:11:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 04:11:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 04:14:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 04:14:38 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 04:17:33 fir-md1-s1 kernel: Lustre: 23593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564139846/real 1564139846] req@ffff8f10e40ada00 x1636746670553344/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564139853 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 04:18:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:18:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 04:18:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 04:18:23 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 26 04:19:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:21:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:21:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 04:22:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 04:22:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 04:24:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 04:24:40 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 04:26:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:28:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 04:28:44 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 04:34:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 04:34:16 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 04:34:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 04:34:45 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 04:36:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:36:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 04:38:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 04:38:52 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 04:44:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 04:44:36 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 04:44:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 04:44:46 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 04:46:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:46:20 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 04:48:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 04:48:54 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 04:54:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 04:54:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 26 04:54:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 04:54:56 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 26 04:57:11 fir-md1-s1 kernel: LustreError: 21714:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f24eb762c50 x1631353478377456/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:16/0 lens 488/448 e 0 to 0 dl 1564142236 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 04:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 26 04:58:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 04:58:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 04:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 04:59:01 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 05:05:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 05:05:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 26 05:06:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:06:07 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 26 05:08:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 05:08:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 05:09:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 05:09:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 05:15:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 05:15:16 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 05:16:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:16:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 26 05:18:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 05:18:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 05:19:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 05:19:37 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 05:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 05:25:18 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 26 05:26:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:26:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 26 05:29:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 05:29:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 05:29:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 05:29:53 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 05:33:32 fir-md1-s1 kernel: Lustre: 23609:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564144405/real 1564144405] req@ffff8f0df051da00 x1636746690504480/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564144412 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 05:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 05:35:28 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 05:38:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:38:40 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 05:40:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 05:40:15 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 05:42:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 05:42:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 05:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 05:45:28 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 26 05:48:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:48:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 05:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 05:50:44 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 05:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 05:56:07 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 26 05:57:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 05:58:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 05:58:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 06:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 06:00:46 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 06:04:42 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564146275/real 1564146275] req@ffff8f12f9630600 x1636746698885568/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564146282 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 06:06:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 06:06:15 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 26 06:08:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 06:08:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 06:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 06:10:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 06:15:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 06:15:12 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 26 06:16:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 06:16:16 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 26 06:19:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 06:19:21 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 06:21:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 06:21:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 06:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3c24c00, cur 1564147397 expire 1564147247 last 1564147170 Jul 26 06:26:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 06:26:17 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 26 06:29:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 06:29:25 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 26 06:30:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 06:30:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 06:31:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 06:31:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 06:36:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 06:36:18 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 26 06:40:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 06:40:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 06:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 06:41:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 06:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 06:46:26 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 26 06:47:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 06:47:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 06:51:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 06:51:40 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 06:51:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 26 06:51:44 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 06:56:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 06:56:35 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 06:59:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 06:59:35 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 07:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 07:01:47 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 26 07:03:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 07:03:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 07:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 07:06:38 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 26 07:07:50 fir-md1-s1 kernel: Lustre: 23634:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564150063/real 1564150063] req@ffff8f0b8a0e8600 x1636746714735584/t0(0) o106->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564150070 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 07:12:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 07:12:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 26 07:14:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 07:14:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 07:14:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 26 07:14:19 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 26 07:17:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 07:17:05 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 07:22:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 07:22:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 07:25:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 07:25:08 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 26 07:26:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 07:26:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 26 07:27:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 07:27:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 26 07:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 07:32:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 26 07:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 07:37:07 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 26 07:37:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 07:37:07 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 07:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 07:43:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 26 07:47:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 07:47:46 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 26 07:48:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 07:48:41 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 07:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 07:53:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 26 07:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 07:57:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 08:00:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 08:00:48 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 26 08:02:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:02:17 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 26 08:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 08:04:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 26 08:07:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 08:07:55 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 26 08:11:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:11:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 08:13:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 08:13:29 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 08:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 08:14:10 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 08:14:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:14:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 08:17:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 08:17:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 26 08:20:33 fir-md1-s1 kernel: list passed to list_sort() too long for efficiency Jul 26 08:23:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 08:24:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 08:26:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 08:26:17 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 08:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 08:28:10 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 26 08:32:28 fir-md1-s1 kernel: Lustre: 21567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f107c63f050 x1638829686389008/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564155153 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 08:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 08:34:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 08:36:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 26 08:36:39 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 26 08:38:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 08:38:12 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 08:40:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:40:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 08:44:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 08:44:30 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 08:48:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 08:48:18 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 26 08:48:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 08:48:18 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 26 08:55:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 08:55:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 08:57:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 08:57:40 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 08:58:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 08:58:29 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 08:58:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 08:58:29 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 09:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 09:05:36 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 09:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 09:08:43 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 09:09:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 09:09:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 09:15:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 09:15:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 09:15:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 09:15:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 09:18:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 09:18:50 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 26 09:20:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 09:20:53 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 09:25:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 09:25:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 09:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 09:26:15 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 09:29:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 09:29:01 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 09:31:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 09:31:08 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 09:37:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 09:37:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 09:38:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 09:38:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 09:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 09:39:43 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 26 09:44:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 09:44:36 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 09:44:40 fir-md1-s1 kernel: Lustre: 20571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564159473/real 1564159473] req@ffff8f080a061500 x1636746749341328/t0(0) o106->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564159480 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 09:47:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 09:47:44 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 26 09:49:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 09:49:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 09:49:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 09:49:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 09:55:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 09:55:55 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 26 09:58:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 09:58:12 fir-md1-s1 kernel: Lustre: Skipped 85414 previous similar messages Jul 26 09:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2916c5ec00, cur 1564160305 expire 1564160155 last 1564160078 Jul 26 09:59:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 09:59:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 10:00:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 10:00:07 fir-md1-s1 kernel: Lustre: Skipped 85430 previous similar messages Jul 26 10:02:10 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 26 10:06:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 10:06:05 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 26 10:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 10:08:15 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 10:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 10:10:13 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 26 10:11:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 10:11:17 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 26 10:17:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 10:17:08 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 26 10:18:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 10:18:27 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 10:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 10:20:29 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 26 10:21:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 10:21:35 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 26 10:23:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250acd9400, cur 1564161786 expire 1564161636 last 1564161559 Jul 26 10:27:17 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Jul 26 10:27:17 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (5): c: 0, oc: 0, rc: 7 Jul 26 10:27:17 fir-md1-s1 kernel: LNetError: 55537:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 26 10:27:17 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f40d664b800 Jul 26 10:27:17 fir-md1-s1 kernel: LNetError: 55554:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.30.5@o2ib6 from 10.0.10.51@o2ib7 Jul 26 10:27:17 fir-md1-s1 kernel: LustreError: 46575:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f11e6d35000 Jul 26 10:27:17 fir-md1-s1 kernel: LustreError: 21996:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f4042643850 x1638883894067648/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564162037 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 10:27:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1800347-72ce-eadd-608d-51a435000390 (at 10.9.112.15@o2ib4), client will retry: rc -110 Jul 26 10:27:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 26 10:27:17 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1c0ef7ea00 Jul 26 10:27:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.209@o2ib7: accepting Jul 26 10:27:17 fir-md1-s1 kernel: LNetError: 55537:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 26 10:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.105.50@o2ib4, removing former export from same NID Jul 26 10:27:19 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 26 10:27:21 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f26937af050 x1635711564322784/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:1/0 lens 488/440 e 1 to 0 dl 1564162051 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 10:27:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9dcf2f2b-339d-b96d-0792-e79b27f28969 (at 10.8.28.2@o2ib6), client will retry: rc -110 Jul 26 10:27:22 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f16537f6c50 x1631587988830016/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:2/0 lens 488/440 e 1 to 0 dl 1564162052 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 10:27:22 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 26 10:27:24 fir-md1-s1 kernel: LustreError: 46575:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1ec4c0d850 x1633752845553328/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:2/0 lens 488/440 e 1 to 0 dl 1564162052 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 10:27:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 26 10:27:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 10:28:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 10:28:39 fir-md1-s1 kernel: Lustre: Skipped 200 previous similar messages Jul 26 10:30:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 10:30:29 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Jul 26 10:33:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 10:33:34 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 10:38:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 10:38:09 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 26 10:38:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 10:38:53 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 10:40:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 10:40:33 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 26 10:44:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 10:44:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 10:48:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 10:48:56 fir-md1-s1 kernel: Lustre: Skipped 12593 previous similar messages Jul 26 10:48:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 10:48:58 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 10:50:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 10:50:35 fir-md1-s1 kernel: Lustre: Skipped 12631 previous similar messages Jul 26 10:54:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 10:54:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 10:58:48 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f19bcad4850 x1631353486214448/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:23/0 lens 488/448 e 0 to 0 dl 1564163933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 10:58:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 26 10:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 10:58:57 fir-md1-s1 kernel: Lustre: Skipped 14956 previous similar messages Jul 26 10:59:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 10:59:36 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 11:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 11:00:46 fir-md1-s1 kernel: Lustre: Skipped 15002 previous similar messages Jul 26 11:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client de202384-8f7d-4975-20ab-e67269969e78 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522991000, cur 1564164143 expire 1564163993 last 1564163916 Jul 26 11:04:49 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 26 11:04:49 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Jul 26 11:04:49 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.107@o2ib7 (6): c: 0, oc: 0, rc: 8 Jul 26 11:04:49 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Jul 26 11:04:49 fir-md1-s1 kernel: Lustre: 71848:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f2d1099dc50 x1634933928082816/t0(0) o35->36c50ebf-42f1-2e51-f789-02d6d7eec692@10.8.8.33@o2ib6:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 11:04:49 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564164283/real 1564164289] req@ffff8f28303f3f00 x1636746789607152/t0(0) o13->fir-OST002e-osc-MDT0002@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564164290 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 26 11:04:49 fir-md1-s1 kernel: Lustre: fir-OST002e-osc-MDT0002: Connection to fir-OST002e (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 26 11:04:49 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3996e4ae00 Jul 26 11:04:49 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f40cd35a800 Jul 26 11:04:49 fir-md1-s1 kernel: Lustre: 71848:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2 previous similar messages Jul 26 11:04:50 fir-md1-s1 kernel: LustreError: 79335:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f376d6de850 x1634134146456640/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564164303 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 26 11:04:51 fir-md1-s1 kernel: LustreError: 46584:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3995771850 x1638933913573072/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564164303 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 294f669a-76d8-9cb4-d54f-e33a51dba159 (at 10.9.112.11@o2ib4), client will retry: rc -110 Jul 26 11:04:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 11:04:51 fir-md1-s1 kernel: LustreError: 46584:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 30 previous similar messages Jul 26 11:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc = -110 Jul 26 11:04:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c4566649-5001-d956-15cb-934d725d7f29 (at 10.9.113.11@o2ib4), client will retry: rc -110 Jul 26 11:04:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 26 11:04:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 829e8e6e-3608-cb1f-779c-fe5437a6c742 (at 10.9.102.33@o2ib4), client will retry: rc = -110 Jul 26 11:04:54 fir-md1-s1 kernel: LustreError: 21545:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1395be3450 x1631543890162720/t0(0) o4->3b71506a-346d-881f-3646-b49dad69578d@10.9.101.64@o2ib4:19/0 lens 504/448 e 0 to 0 dl 1564164319 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:04:54 fir-md1-s1 kernel: LustreError: 21545:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 10 previous similar messages Jul 26 11:04:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4068d1dc-23c0-77d4-d74f-2d23e9a4aa67 (at 10.8.2.30@o2ib6), client will retry: rc = -110 Jul 26 11:04:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 11:04:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1a9f40f3-2131-e092-9c2d-84d78e23572f (at 10.9.114.14@o2ib4), client will retry: rc -110 Jul 26 11:04:58 fir-md1-s1 kernel: Lustre: 46543:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3995772450 x1631550480034512/t0(0) o4->0b1e5e12-b864-011b-3b01-364d3bf9baff@10.9.107.56@o2ib4:3/0 lens 520/456 e 1 to 0 dl 1564164303 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:00 fir-md1-s1 kernel: Lustre: 46543:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f40e8780c50 x1633747726040720/t0(0) o4->1a5994ed-f702-43b8-5d0a-573a3a27bb32@10.9.107.32@o2ib4:5/0 lens 520/456 e 1 to 0 dl 1564164305 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:00 fir-md1-s1 kernel: Lustre: 46543:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 11:05:03 fir-md1-s1 kernel: LustreError: 46555:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+0s req@ffff8f2eaa3c9c50 x1636449849824304/t0(0) o3->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564164303 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 59f098aa-fb21-8ed8-84bd-d0ce06cad654 (at 10.9.102.46@o2ib4), client will retry: rc -110 Jul 26 11:05:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 11:05:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0b1e5e12-b864-011b-3b01-364d3bf9baff (at 10.9.107.56@o2ib4), client will retry: rc = -110 Jul 26 11:05:03 fir-md1-s1 kernel: Lustre: 21996:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e4c360c50 x1635091251089232/t0(0) o4->a2c269ef-57a9-8b99-0a4b-44a7d221d7bd@10.9.109.36@o2ib4:8/0 lens 504/448 e 1 to 0 dl 1564164308 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:05 fir-md1-s1 kernel: LustreError: 21741:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f40e8780c50 x1633747726040720/t0(0) o4->1a5994ed-f702-43b8-5d0a-573a3a27bb32@10.9.107.32@o2ib4:5/0 lens 520/456 e 1 to 0 dl 1564164305 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:05 fir-md1-s1 kernel: LustreError: 21741:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 11:05:08 fir-md1-s1 kernel: LustreError: 21534:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f3e4c360c50 x1635091251089232/t0(0) o4->a2c269ef-57a9-8b99-0a4b-44a7d221d7bd@10.9.109.36@o2ib4:8/0 lens 504/448 e 1 to 0 dl 1564164308 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:12 fir-md1-s1 kernel: Lustre: 21298:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f36cef20c50 x1634143573349600/t0(0) o4->a8495761-7359-3610-2479-b4da362523dd@10.9.101.31@o2ib4:17/0 lens 488/448 e 0 to 0 dl 1564164317 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:12 fir-md1-s1 kernel: Lustre: 21298:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 26 11:05:17 fir-md1-s1 kernel: LustreError: 21995:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f36cef20c50 x1634143573349600/t0(0) o4->a8495761-7359-3610-2479-b4da362523dd@10.9.101.31@o2ib4:17/0 lens 488/448 e 0 to 0 dl 1564164317 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:17 fir-md1-s1 kernel: LustreError: 21995:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 26 11:05:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a8495761-7359-3610-2479-b4da362523dd (at 10.9.101.31@o2ib4), client will retry: rc = -110 Jul 26 11:05:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 26 11:05:18 fir-md1-s1 kernel: LustreError: 20503:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1cdb8d7450 x1635089915710128/t0(0) o4->ddc9790c-0eb3-6a50-110f-d17442bde73c@10.9.107.53@o2ib4:19/0 lens 504/448 e 0 to 0 dl 1564164319 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:05:18 fir-md1-s1 kernel: LustreError: 20503:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 26 11:05:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 11:05:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 11:09:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 11:09:21 fir-md1-s1 kernel: Lustre: Skipped 5798 previous similar messages Jul 26 11:10:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 11:10:38 fir-md1-s1 kernel: Lustre: Skipped 188 previous similar messages Jul 26 11:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 11:10:56 fir-md1-s1 kernel: Lustre: Skipped 5973 previous similar messages Jul 26 11:17:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 11:17:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 11:19:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 11:19:23 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 11:20:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 11:20:57 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 11:21:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 11:21:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 26 11:29:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 11:29:33 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 11:30:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 11:30:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 11:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 11:31:08 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 11:34:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 26 11:34:40 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 26 11:39:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 11:39:34 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: 23103:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f15a6243c00 x1637402328275632/t0(0) o102->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 11:41:12 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f41b5214050 x1639232596135984/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564166472 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:12 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 11:41:12 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 26 11:41:12 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f371ac95c00 Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: 20206:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564166472/real 1564166472] req@ffff8f13e5d47b00 x1636746807672416/t0(0) o41->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1564166479 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: 20206:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 11:41:12 fir-md1-s1 kernel: Lustre: 23103:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 6 previous similar messages Jul 26 11:41:14 fir-md1-s1 kernel: LustreError: 21448:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1faed89c50 x1631579888884080/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:26/0 lens 488/440 e 1 to 0 dl 1564166486 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 26 11:41:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 11:41:14 fir-md1-s1 kernel: LustreError: 21448:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 11:41:16 fir-md1-s1 kernel: LustreError: 46591:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f25217cac50 x1635691977471792/t0(0) o4->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:2/0 lens 488/448 e 1 to 0 dl 1564166492 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:16 fir-md1-s1 kernel: LustreError: 46591:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc = -110 Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: 20212:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564166469/real 1564166472] req@ffff8f09b4a72700 x1636746807671840/t0(0) o41->fir-MDT0003-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1564166476 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: 20212:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0002: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 26 11:41:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 11:41:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 26 11:41:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 26 11:41:18 fir-md1-s1 kernel: LustreError: 27584:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f20147b1850 x1639526929202160/t0(0) o4->c7317de1-dc12-7eeb-b2c7-8bda04dd1f78@10.8.7.8@o2ib6:2/0 lens 520/456 e 1 to 0 dl 1564166492 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:18 fir-md1-s1 kernel: LustreError: 27584:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 26 11:41:21 fir-md1-s1 kernel: Lustre: 27584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1faed8d450 x1638088667389136/t0(0) o3->82cc7b58-93fc-4c30-9e15-8687148a95b5@10.8.1.1@o2ib6:26/0 lens 488/440 e 1 to 0 dl 1564166486 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:21 fir-md1-s1 kernel: Lustre: 27584:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 26 11:41:26 fir-md1-s1 kernel: LustreError: 21541:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+0s req@ffff8f1faed8d450 x1638088667389136/t0(0) o3->82cc7b58-93fc-4c30-9e15-8687148a95b5@10.8.1.1@o2ib6:26/0 lens 488/440 e 1 to 0 dl 1564166486 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:26 fir-md1-s1 kernel: LustreError: 21541:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 11:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 82cc7b58-93fc-4c30-9e15-8687148a95b5 (at 10.8.1.1@o2ib6), client will retry: rc -110 Jul 26 11:41:27 fir-md1-s1 kernel: Lustre: 23455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1f74039200 x1634130785405792/t0(0) o101->3c321701-2950-e3a0-e425-740898be58b7@10.8.28.4@o2ib6:2/0 lens 376/1600 e 1 to 0 dl 1564166492 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:32 fir-md1-s1 kernel: LustreError: 46547:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1faed8e050 x1635095356806032/t0(0) o4->b16e4006-ad8f-de37-ede7-21e0aff43fcc@10.8.1.3@o2ib6:2/0 lens 488/448 e 1 to 0 dl 1564166492 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 11:41:32 fir-md1-s1 kernel: LustreError: 46547:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 11:41:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b16e4006-ad8f-de37-ede7-21e0aff43fcc (at 10.8.1.3@o2ib6), client will retry: rc = -110 Jul 26 11:41:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 11:41:34 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f1f74039200 x1634130785405792/t354939202531(0) o101->3c321701-2950-e3a0-e425-740898be58b7@10.8.28.4@o2ib6:2/0 lens 376/968 e 1 to 0 dl 1564166492 ref 1 fl Complete:/0/0 rc 0/0 Jul 26 11:45:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 11:45:02 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 26 11:47:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 11:47:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 11:49:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 11:49:44 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 26 11:51:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 11:51:16 fir-md1-s1 kernel: Lustre: Skipped 145 previous similar messages Jul 26 11:55:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 11:55:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 11:57:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 11:57:30 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 12:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 12:00:02 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 26 12:01:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 12:01:22 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 26 12:05:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 12:05:41 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 26 12:06:38 fir-md1-s1 kernel: Lustre: 21669:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564167991/real 1564167991] req@ffff8f11f2251800 x1636746821945696/t0(0) o106->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564167998 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 12:08:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 12:08:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 12:10:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 12:10:23 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 26 12:11:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 12:11:33 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 26 12:15:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 12:15:42 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 26 12:16:10 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0d5d03f450 x1637104937205136/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1564168575 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 12:16:10 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 26 12:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 12:20:35 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 12:21:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 12:21:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 12:27:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 12:27:05 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 26 12:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 12:30:46 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 26 12:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 12:32:23 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 26 12:35:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 12:35:53 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 12:37:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 12:37:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 26 12:40:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 12:40:49 fir-md1-s1 kernel: Lustre: Skipped 23720 previous similar messages Jul 26 12:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 12:42:24 fir-md1-s1 kernel: Lustre: Skipped 23757 previous similar messages Jul 26 12:45:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 12:48:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 12:48:25 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 26 12:51:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 12:51:06 fir-md1-s1 kernel: Lustre: Skipped 35475 previous similar messages Jul 26 12:51:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 12:51:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 12:52:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 12:52:41 fir-md1-s1 kernel: Lustre: Skipped 35480 previous similar messages Jul 26 12:57:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 12:57:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 12:58:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 12:58:51 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 26 13:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 13:01:47 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 26 13:02:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 13:02:42 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 13:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b56a1ec8-0b0a-d175-8a94-cceee8d14724 (at 10.8.12.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ee765400, cur 1564171505 expire 1564171355 last 1564171278 Jul 26 13:05:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 13:08:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 13:08:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 13:09:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 13:09:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 13:11:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 13:11:55 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 13:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0d25a420-fc91-d79e-e567-c2baf664cda3 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2db293f400, cur 1564171916 expire 1564171766 last 1564171689 Jul 26 13:11:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 13:12:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ffa27290-6cf4-9b77-ab2a-7df1aa693fad (at 10.8.21.21@o2ib6) Jul 26 13:12:44 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 13:21:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 26 13:21:18 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 13:22:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 13:22:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 13:23:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 13:23:01 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 26 13:23:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 13:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 13:32:17 fir-md1-s1 kernel: Lustre: Skipped 29244 previous similar messages Jul 26 13:32:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 13:32:32 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 26 13:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 13:33:11 fir-md1-s1 kernel: Lustre: Skipped 29256 previous similar messages Jul 26 13:35:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 13:35:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 13:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 91a114f3-5ca9-fb70-2f8e-3d44af72071b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21dc2ea000, cur 1564173396 expire 1564173246 last 1564173169 Jul 26 13:36:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 13:42:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 13:42:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 13:42:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 13:42:40 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 13:43:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 13:43:47 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 26 13:45:32 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 26 13:47:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 13:47:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 13:52:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 13:52:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 26 13:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 13:53:06 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 26 13:53:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 13:53:50 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 13:57:41 fir-md1-s1 kernel: Lustre: 23651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1321ad2d00 x1639515475708592/t0(0) o36->04c17dce-45f1-fe7e-2627-7efeaaeaddb9@10.9.0.62@o2ib4:16/0 lens 496/448 e 1 to 0 dl 1564174666 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:41 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2151c1bc00 x1631314999466944/t0(0) o101->2defae61-8bf0-dee6-7d48-53b83a69e973@10.8.17.24@o2ib6:16/0 lens 584/3264 e 1 to 0 dl 1564174666 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:41 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Jul 26 13:57:42 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e417b8300 x1631616936119488/t0(0) o101->d8d6f8e7-a2cd-08f2-c263-fa8b0dbeef3c@10.8.8.2@o2ib6:17/0 lens 584/3264 e 1 to 0 dl 1564174667 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:42 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages Jul 26 13:57:44 fir-md1-s1 kernel: Lustre: 97665:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d78407b00 x1634174143057472/t0(0) o101->7be5bbe0-2731-daa5-0df1-9cf6bf850b1e@10.8.27.14@o2ib6:19/0 lens 584/3264 e 1 to 0 dl 1564174669 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:44 fir-md1-s1 kernel: Lustre: 97665:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 46 previous similar messages Jul 26 13:57:48 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34e778d100 x1631545502721104/t0(0) o101->21db4e74-db2a-768a-66c3-cfe236936806@10.8.2.22@o2ib6:23/0 lens 584/3264 e 1 to 0 dl 1564174673 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:48 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages Jul 26 13:57:57 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f262110ce00 x1631619039212832/t0(0) o101->e45eae18-7cf5-c24e-ada4-411d043e0647@10.8.7.19@o2ib6:2/0 lens 584/3264 e 0 to 0 dl 1564174682 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:57:57 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 26 13:58:16 fir-md1-s1 kernel: Lustre: 23610:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f348ccd8c00 x1633715523980368/t0(0) o101->72ec30aa-3de0-c9e1-e316-3673d47174c8@10.8.8.8@o2ib6:21/0 lens 584/3264 e 1 to 0 dl 1564174701 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 13:58:16 fir-md1-s1 kernel: Lustre: 23610:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 26 13:58:34 fir-md1-s1 kernel: Lustre: 24579:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:1s); client may timeout. req@ffff8f22b1455100 x1638237513525296/t0(0) o101->8ec1acae-5541-1224-6330-34435f948ba9@10.9.106.61@o2ib4:2/0 lens 584/536 e 0 to 0 dl 1564174713 ref 1 fl Complete:/0/0 rc 0/0 Jul 26 13:58:34 fir-md1-s1 kernel: Lustre: 24579:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Jul 26 14:02:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 14:02:50 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 26 14:03:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 14:03:23 fir-md1-s1 kernel: Lustre: Skipped 674 previous similar messages Jul 26 14:03:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 14:03:56 fir-md1-s1 kernel: Lustre: Skipped 729 previous similar messages Jul 26 14:07:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 14:07:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 14:13:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 14:13:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 14:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 14:14:08 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 26 14:14:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 14:14:11 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 14:17:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 14:17:53 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 26 14:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 098aaeb7-6554-e0af-2763-bd26ea92671b (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f225ac4dc00, cur 1564176199 expire 1564176049 last 1564175972 Jul 26 14:23:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 14:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 14:23:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 14:24:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 14:24:10 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 14:26:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 14:26:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 14:28:30 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 26 14:29:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 14:29:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 14:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 14:33:49 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 14:34:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 14:34:13 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 14:37:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 14:37:22 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 14:40:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 14:40:12 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 26 14:41:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6c05b78c-fac7-a022-674f-0e421e702ef3 (at 10.9.108.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc35dc00, cur 1564177284 expire 1564177134 last 1564177057 Jul 26 14:41:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 14:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6c05b78c-fac7-a022-674f-0e421e702ef3 (at 10.9.108.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1506595400, cur 1564177286 expire 1564177136 last 1564177059 Jul 26 14:41:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 14:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 14:44:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 14:44:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 14:44:17 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 14:47:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 14:47:31 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 26 14:50:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 14:50:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 14:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 14:54:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 14:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 14:54:32 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 26 14:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 30211097-8ceb-daaa-1fcc-f1cb6ab40fba (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2457a0a800, cur 1564178266 expire 1564178116 last 1564178039 Jul 26 14:57:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 26 14:57:58 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 15:02:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 15:02:17 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 26 15:04:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 15:04:38 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 26 15:04:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 15:04:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 15:07:20 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f11f6331050 x1631629663453776/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564178845 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 15:07:20 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 26 15:07:29 fir-md1-s1 kernel: Lustre: 24213:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f11f6331050 x1631629663453776/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:25/0 lens 488/408 e 1 to 0 dl 1564178845 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 26 15:09:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 26 15:09:35 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 15:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 15:14:49 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 15:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 15:14:49 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Jul 26 15:17:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 15:17:26 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 15:21:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 15:21:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 15:25:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 15:25:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 15:25:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 15:25:06 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 15:27:50 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1395be3c50 x1639233092844528/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564180075 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 15:29:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 15:29:05 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 26 15:32:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 26 15:32:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 15:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 15:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 15:35:58 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 15:35:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 15:38:37 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0d5d038c50 x1637105338316272/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564180722 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 15:38:44 fir-md1-s1 kernel: Lustre: 21451:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f0d5d038c50 x1637105338316272/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:12/0 lens 488/408 e 1 to 0 dl 1564180722 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 26 15:40:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 15:40:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 15:42:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 15:42:15 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 26 15:46:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 15:46:22 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 26 15:46:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 15:46:22 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 15:51:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 15:51:20 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 15:53:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 15:53:53 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 15:56:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 15:56:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 15:56:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 15:56:34 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 16:03:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 16:03:18 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 16:03:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 16:03:55 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 16:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 16:06:45 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 16:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 16:06:45 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 16:07:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 26 16:07:47 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 26 16:15:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 16:15:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 16:16:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 16:16:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 16:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 16:17:14 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 16:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 16:17:14 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 16:26:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 16:26:23 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 16:27:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 16:27:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 16:27:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 16:27:16 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 26 16:27:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 16:27:32 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 16:36:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 16:36:33 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 26 16:37:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 16:37:21 fir-md1-s1 kernel: Lustre: Skipped 129 previous similar messages Jul 26 16:37:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 16:37:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 16:38:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 16:38:10 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 26 16:44:33 fir-md1-s1 kernel: Lustre: 81717:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f125673e850 x1638950272072576/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:7/0 lens 488/440 e 0 to 0 dl 1564184677 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 16:44:37 fir-md1-s1 kernel: LustreError: 21709:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f125673e850 x1638950272072576/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:7/0 lens 488/440 e 0 to 0 dl 1564184677 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 16:44:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 26 16:44:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 16:47:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 16:47:22 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 16:47:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 16:47:22 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 26 16:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 16:48:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 26 16:49:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 16:49:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 16:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17ed3d22-74e0-0a33-29fe-26352205d024 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f40f9fa0000, cur 1564185365 expire 1564185215 last 1564185138 Jul 26 16:56:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 16:57:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 16:57:33 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 26 16:58:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 16:58:43 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 16:59:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 16:59:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 17:00:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:00:32 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 17:01:26 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 26 17:07:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 17:07:41 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 26 17:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 17:08:44 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 17:11:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 17:11:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 26 17:11:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:11:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 17:17:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 17:17:54 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 26 17:18:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 17:18:53 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 17:21:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 17:21:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 17:26:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:26:00 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 17:28:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 17:28:03 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 26 17:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 17:28:58 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 17:31:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 17:31:44 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 17:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e62a5e000, cur 1564187753 expire 1564187603 last 1564187526 Jul 26 17:35:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 17:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 17:38:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 26 17:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 17:39:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 17:41:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 17:41:57 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 17:48:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 17:48:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 26 17:49:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:49:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 17:49:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 17:49:45 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 17:51:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:53:32 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f124eecac50 x1639233398386128/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564188816 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 17:55:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 17:56:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 17:56:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 17:58:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 17:58:07 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 26 18:00:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 18:00:20 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 18:02:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 18:02:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 18:06:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 18:06:16 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 26 18:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 18:08:15 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 26 18:10:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 18:10:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 18:12:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 18:12:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 18:15:37 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f093a6ec850 x1638869744060480/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564190142 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 18:15:45 fir-md1-s1 kernel: LustreError: 21484:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f093a6ec850 x1638869744060480/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564190142 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 18:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 2f627314-68e3-35d2-70d7-0cd2604dd048 (at 10.9.115.4@o2ib4), client will retry: rc -107 Jul 26 18:15:45 fir-md1-s1 kernel: Lustre: 21484:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f093a6ec850 x1638869744060480/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564190142 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 26 18:18:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 18:18:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 18:18:58 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 26 18:18:58 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 18:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 18:20:35 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 18:21:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d109bdc00, cur 1564190488 expire 1564190338 last 1564190261 Jul 26 18:23:09 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f093a6ea050 x1631550556561664/t0(0) o3->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564190594 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 18:23:21 fir-md1-s1 kernel: LustreError: 21485:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f093a6ea050 x1631550556561664/t0(0) o3->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564190594 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 18:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc -107 Jul 26 18:23:21 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:7s); client may timeout. req@ffff8f093a6ea050 x1631550556561664/t0(0) o3->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564190594 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 26 18:23:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 18:23:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 26 18:29:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 18:29:27 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 26 18:29:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 18:29:55 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 18:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 18:31:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 26 18:34:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e25e6ce8-03d8-aa3c-798b-77ef8f87fe58 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cb49e8400, cur 1564191268 expire 1564191118 last 1564191041 Jul 26 18:37:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 18:37:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 18:39:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 18:39:28 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 26 18:40:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 18:40:33 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 18:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 18:41:25 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 18:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cfd2d74f-ab49-0ef3-d616-0f32b36c0e4f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f276dbed800, cur 1564191847 expire 1564191697 last 1564191620 Jul 26 18:44:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 18:46:20 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 26 18:46:20 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 26 18:49:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 18:49:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 26 18:51:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 18:51:36 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 18:52:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 18:52:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 18:52:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 18:52:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 18:59:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 18:59:31 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 26 19:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 266c7cb5-7894-c657-4a4c-d59131e9dbf8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e5a73d400, cur 1564192866 expire 1564192716 last 1564192639 Jul 26 19:01:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 19:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 19:01:49 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 19:02:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:02:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 19:02:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:02:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 19:09:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 19:09:41 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 26 19:11:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 19:11:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 26 19:12:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:12:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:12:54 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 26 19:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 10219fee-a269-3b9b-dffa-f0473a5e7caf (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f072d34d400, cur 1564193700 expire 1564193550 last 1564193473 Jul 26 19:15:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 19:20:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 19:20:06 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 19:22:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 19:22:07 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 19:22:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:22:53 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 26 19:25:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:25:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 26 19:30:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 19:30:53 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 26 19:32:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 19:32:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 19:33:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:33:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 19:35:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:35:37 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 26 19:41:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 19:41:30 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 19:42:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 19:42:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 26 19:44:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:44:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 19:45:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:45:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 26 19:51:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 19:51:33 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 26 19:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 19:52:34 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 19:55:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 19:55:51 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 19:57:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 19:57:04 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 20:01:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 20:01:37 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 26 20:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 20:03:22 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 26 20:05:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 20:05:54 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 26 20:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 20:11:43 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 26 20:12:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 20:12:27 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 20:13:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 20:13:28 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 20:21:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 20:21:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 20:21:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 20:21:53 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 20:23:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 20:23:39 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 20:29:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 20:29:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 20:31:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 26 20:31:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 20:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 20:32:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 26 20:33:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 20:33:52 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 20:41:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 20:41:17 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 20:42:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 20:42:09 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 26 20:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 20:43:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 20:44:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 20:44:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 20:51:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 20:51:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 26 20:52:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 20:52:20 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 26 20:54:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 20:54:11 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 20:55:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 20:55:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 26 21:01:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:01:34 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 26 21:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 21:02:24 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 26 21:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 21:04:43 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 21:09:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 21:09:18 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 21:12:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 21:12:54 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 21:14:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 21:14:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 26 21:18:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:18:06 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 26 21:22:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 21:22:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 21:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 21:22:56 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 26 21:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 21:25:07 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 21:28:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:28:09 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 26 21:32:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 21:32:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 21:33:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 21:33:08 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 26 21:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 21:35:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 21:38:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:38:19 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 26 21:42:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e8f3d29a-2240-447c-e393-d5c3950490d7 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ec11eac00, cur 1564202556 expire 1564202406 last 1564202329 Jul 26 21:42:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 21:42:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e8f3d29a-2240-447c-e393-d5c3950490d7 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0d45011800, cur 1564202560 expire 1564202410 last 1564202333 Jul 26 21:42:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 21:42:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 21:42:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 21:43:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 21:43:28 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 21:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 153 seconds. I think it's dead, and I am evicting it. exp ffff8f1f5be76800, cur 1564202632 expire 1564202482 last 1564202479 Jul 26 21:45:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 21:45:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 21:48:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:48:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 26 21:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 26 21:54:06 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 26 21:55:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 21:55:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 21:58:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 21:58:27 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 26 21:59:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 21:59:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 22:04:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 22:04:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 26 22:05:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 26 22:05:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 8 Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 26 22:06:25 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 50580:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.18.21@o2ib6 from 10.0.10.51@o2ib7 Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 50580:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 228 previous similar messages Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 46574:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f203aa76c00 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 21245:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f11e6d35600 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0c0e7ce600 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 46523:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1d0165e200 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 46527:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1d0165ac00 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 24563:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f36eeb42c00 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 46517:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f11e6d34c00 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 46578:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f12f21b8e00 Jul 26 22:06:25 fir-md1-s1 kernel: LustreError: 27605:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1533c8e800 Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 26 22:06:25 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 26 22:06:27 fir-md1-s1 kernel: Lustre: 10585:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564203980/real 1564203985] req@ffff8f262d3b5d00 x1636747163915712/t0(0) o104->fir-MDT0000@10.8.15.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564203987 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 26 22:06:27 fir-md1-s1 kernel: LustreError: 21245:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34f09e0050 x1631588546275760/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:20/0 lens 488/440 e 0 to 0 dl 1564204010 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:06:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 26 22:06:28 fir-md1-s1 kernel: LustreError: 24213:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1143bd1050 x1631304973113232/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564204000 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:06:28 fir-md1-s1 kernel: LustreError: 24213:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 22:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 5c9f5376-a105-7e2f-1c52-759657f6fd7d (at 10.9.101.59@o2ib4), client will retry: rc -110 Jul 26 22:06:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 22:06:29 fir-md1-s1 kernel: LustreError: 46523:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2e77dcbc50 x1640013417448112/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:10/0 lens 488/440 e 1 to 0 dl 1564204000 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with eaf995be-0d27-b013-5e90-e619713af34c (at 10.8.13.6@o2ib6), client will retry: rc = -110 Jul 26 22:06:29 fir-md1-s1 kernel: LustreError: 46523:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 26 22:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 26 22:06:40 fir-md1-s1 kernel: Lustre: 21793:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a4ad34850 x1631573948285904/t0(0) o4->0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a@10.8.8.11@o2ib6:15/0 lens 504/448 e 1 to 0 dl 1564204005 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 22:06:45 fir-md1-s1 kernel: LustreError: 46527:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f2a4ad34850 x1631573948285904/t0(0) o4->0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a@10.8.8.11@o2ib6:15/0 lens 504/448 e 1 to 0 dl 1564204005 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a (at 10.8.8.11@o2ib6), client will retry: rc = -110 Jul 26 22:08:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 22:08:31 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 26 22:11:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 22:11:11 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 22:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 26 22:14:32 fir-md1-s1 kernel: Lustre: Skipped 400 previous similar messages Jul 26 22:15:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8e701390-8699-82e1-92f2-9148262b7874 (at 10.8.17.20@o2ib6) reconnecting Jul 26 22:15:28 fir-md1-s1 kernel: Lustre: Skipped 274 previous similar messages Jul 26 22:20:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 22:20:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 26 22:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3497ca9000, cur 1564204838 expire 1564204688 last 1564204611 Jul 26 22:23:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 22:23:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 22:24:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 22:24:33 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 26 22:25:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 22:25:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 26 22:30:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 26 22:30:37 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 7 Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0682d1fe00 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f09318bd800 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2f8c176800 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0682d18c00 Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 24563:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.3@o2ib6 from 10.0.10.51@o2ib7 Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 24563:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 334 previous similar messages Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 24563:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f09318bfc00 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2af262b600 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2af2629600 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f07b7e5dc00 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0ab4ad9600 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3e3b862200 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f36b63f7800 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f077125e800 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb6200 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2ec2de2800 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2a82c5d000 Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0e1ae60200 Jul 26 22:31:04 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Jul 26 22:31:04 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb6600 Jul 26 22:31:07 fir-md1-s1 kernel: LustreError: 21714:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f165e8dcc50 x1639988174162544/t0(0) o3->bd7772a0-5656-7b9e-2b19-3f87efa63ec1@10.8.15.6@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564205489 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bd7772a0-5656-7b9e-2b19-3f87efa63ec1 (at 10.8.15.6@o2ib6), client will retry: rc -110 Jul 26 22:31:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 22:31:07 fir-md1-s1 kernel: LustreError: 21714:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 22:31:14 fir-md1-s1 kernel: Lustre: 25630:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f165e8de050 x1631580435949232/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:19/0 lens 488/440 e 1 to 0 dl 1564205479 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:14 fir-md1-s1 kernel: Lustre: 25630:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 22:31:15 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f33f2ac0850 x1640013460991184/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:19/0 lens 488/440 e 1 to 0 dl 1564205479 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:15 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 22:31:19 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 15+0s req@ffff8f1bf1164c50 x1638887693338672/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:19/0 lens 488/440 e 1 to 0 dl 1564205479 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 26 22:31:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 22:31:19 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e6a6bbc50 x1638887693338832/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:24/0 lens 488/440 e 1 to 0 dl 1564205484 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:19 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 26 22:31:19 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Jul 26 22:31:20 fir-md1-s1 kernel: LustreError: 25972:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f24957acc50 x1631580435950032/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564205489 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 26 22:31:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 26 22:31:24 fir-md1-s1 kernel: Lustre: 6549:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f33f2ac6850 x1633753448313936/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564205489 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:24 fir-md1-s1 kernel: Lustre: 6549:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 26 22:31:24 fir-md1-s1 kernel: LustreError: 24565:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2d2a31f850 x1631588582635520/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:24/0 lens 488/440 e 1 to 0 dl 1564205484 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 11f7dba6-7171-5836-2062-1974c5637c6a (at 10.8.28.11@o2ib6), client will retry: rc -110 Jul 26 22:31:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 22:31:24 fir-md1-s1 kernel: LustreError: 24565:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 22:31:25 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1bf1167850 x1640013460991680/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564205489 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:25 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 22:31:29 fir-md1-s1 kernel: LustreError: 22059:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 25+0s req@ffff8f33f2ac6850 x1633753448313936/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564205489 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 22:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 26 22:31:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 26 22:34:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 22:34:36 fir-md1-s1 kernel: Lustre: Skipped 158 previous similar messages Jul 26 22:35:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 22:35:02 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 26 22:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 22:35:51 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 22:40:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 22:40:44 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 26 22:44:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 22:44:37 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 26 22:46:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 22:46:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 26 22:46:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 22:46:05 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 26 22:51:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 22:51:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 22:54:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 22:54:41 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 26 22:56:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 22:56:16 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 26 22:56:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 22:56:35 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 26 23:01:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 23:01:56 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 26 23:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 23:04:43 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 26 23:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 26 23:06:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 26 23:09:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=18 reqQ=0 recA=23, svcEst=1, delay=6528 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 44036:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.10@o2ib4: deadline 6:1s ago req@ffff8f1c333a9450 x1638905235362528/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 488/0 e 0 to 0 dl 1564207770 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 46534:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f1ce3f02450 x1639958366672528/t0(0) o3->f111b25a-6d2a-16a8-5df8-392d9e810365@10.8.15.4@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564207770 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 44036:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 360 previous similar messages Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 46578:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1db695c450 x1638905235362816/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 488/0 e 0 to 0 dl 1564207770 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 46578:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 55488:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f1b3cc74850 x1635092405208464/t0(0) o400->22de919f-a3b7-9100-af0c-8f708d4ead17@10.9.105.4@o2ib4:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 23:09:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (0): c: 8, oc: 0, rc: 8 Jul 26 23:09:32 fir-md1-s1 kernel: LNetError: 46510:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 46567:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+2s req@ffff8f1db695d850 x1631630763972464/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564207770 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 46567:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 22958:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f270a817a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3980230a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 24572:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f284470a600 Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 21073:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564207764/real 1564207764] req@ffff8f0a8f1a3000 x1636747198416944/t0(0) o106->fir-MDT0002@10.9.106.24@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564207771 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f093374fc00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f077125e600 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f270a814a00 Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21bec43e00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 14790:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 14790:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.109.42@o2ib4 x1635197823365856 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f8c175200 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21bec41800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dfcacd800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d96e61a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34f41bd800 Jul 26 23:09:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 2 seconds Jul 26 23:09:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 5 previous similar messages Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f350fb8b200 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f093374a400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2601eb0a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f342e0baa00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f8c175400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34f41bb600 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980234600 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17df7b5c00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d96e63400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2601eb4800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f21bec42a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3b948e4800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34f41bb600 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f145f39dc00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2d96e63400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980230800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f09337f3c00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2f8c172200 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f312fb4ec00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0afbbd5000 Jul 26 23:09:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.211@o2ib7: accepting Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: fir-OST0006-osc-MDT0002: Connection to fir-OST0006 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350fb89a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f180a27e000 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ceb6f4800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34f41bc400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dfcace600 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21bec42a00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d96e67e00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f8c176800 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34f41b8e00 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dfcace000 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d96e62000 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dfcaca200 Jul 26 23:09:32 fir-md1-s1 kernel: LNetError: 23723:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.29.1@o2ib6 from 10.0.10.51@o2ib7 Jul 26 23:09:32 fir-md1-s1 kernel: LNetError: 46510:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 26 previous similar messages Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 46510:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2c8154f400 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+2s req@ffff8f3271166050 x1638929097342896/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564207770 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:32 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 51 previous similar messages Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 46510:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f3271166050 x1638929097342896/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564207770 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:09:32 fir-md1-s1 kernel: Lustre: 46510:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 73 previous similar messages Jul 26 23:09:34 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f179f182050 x1631588606025968/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564207794 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:34 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 23:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 26 23:09:34 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 26 23:09:37 fir-md1-s1 kernel: LustreError: 46528:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+6s req@ffff8f3271160850 x1638875208308848/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564207770 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 59f098aa-fb21-8ed8-84bd-d0ce06cad654 (at 10.9.102.46@o2ib4), client will retry: rc = -110 Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f12e2417850 x1638934743644640/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564207770 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:09:37 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 26 23:09:37 fir-md1-s1 kernel: LustreError: 46528:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 26 23:09:39 fir-md1-s1 kernel: Lustre: 46550:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3be019a850 x1638876294906160/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564207784 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:39 fir-md1-s1 kernel: Lustre: 46550:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 26 23:09:44 fir-md1-s1 kernel: LustreError: 46543:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 12+0s req@ffff8f3996f79450 x1638905235362592/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564207784 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 26 23:09:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 26 23:09:44 fir-md1-s1 kernel: LustreError: 46543:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 26 23:09:47 fir-md1-s1 kernel: Lustre: 46559:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c333ac850 x1631580491648096/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:22/0 lens 488/440 e 1 to 0 dl 1564207792 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:47 fir-md1-s1 kernel: Lustre: 46559:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 26 23:09:47 fir-md1-s1 kernel: Lustre: 21740:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f35daf93850 x1639508372573888/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564207784 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:09:47 fir-md1-s1 kernel: Lustre: 21740:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 26 23:09:49 fir-md1-s1 kernel: Lustre: 23563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0521fdcb00 x1634137549872448/t0(0) o101->05133d08-3c30-bc0b-3005-cf52634e4b28@10.9.101.47@o2ib4:24/0 lens 480/568 e 0 to 0 dl 1564207794 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:49 fir-md1-s1 kernel: Lustre: 23563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 23:09:52 fir-md1-s1 kernel: LustreError: 22226:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f37cea56850 x1639234106964768/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:22/0 lens 488/440 e 1 to 0 dl 1564207792 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f111b25a-6d2a-16a8-5df8-392d9e810365 (at 10.8.15.4@o2ib6), client will retry: rc -110 Jul 26 23:09:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 26 23:09:52 fir-md1-s1 kernel: LustreError: 22226:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 26 23:09:56 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34abf68c50 x1640013515048736/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:2/0 lens 488/440 e 0 to 0 dl 1564207802 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:57 fir-md1-s1 kernel: Lustre: 20503:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c333afc50 x1631575454285696/t0(0) o3->a2c44fb9-486a-447c-ab16-c5c889d1e2f3@10.8.27.3@o2ib6:2/0 lens 488/440 e 0 to 0 dl 1564207802 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:09:57 fir-md1-s1 kernel: Lustre: 20503:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 26 previous similar messages Jul 26 23:09:57 fir-md1-s1 kernel: Lustre: 46542:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3s); client may timeout. req@ffff8f424c665850 x1631630763973648/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564207794 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:10:02 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f34da27fc50 x1631630763974128/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564207802 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:10:02 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 26 previous similar messages Jul 26 23:10:02 fir-md1-s1 kernel: LustreError: 46568:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f16ea4b2850 x1639988224076496/t0(0) o3->bd7772a0-5656-7b9e-2b19-3f87efa63ec1@10.8.15.6@o2ib6:2/0 lens 488/440 e 0 to 0 dl 1564207802 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:10:02 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1b599cd400 x1638872291440736/t0(0) o101->191e7928-23a0-eccc-c908-3ef7952d34e9@10.9.103.35@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1564207807 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:10:02 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 26 23:10:16 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 43s: evicting client at 10.9.103.5@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f397cefb840/0x5d9ee689b9972deb lrc: 3/0,0 mode: PW/PW res: [0x2c002c4a0:0x8be:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.5@o2ib4 remote: 0xfd33651675b45101 expref: 466 pid: 23587 timeout: 3322862 lvb_type: 0 Jul 26 23:11:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 23:11:20 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 23:11:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 23:11:59 fir-md1-s1 kernel: Lustre: Skipped 300 previous similar messages Jul 26 23:14:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 23:14:54 fir-md1-s1 kernel: Lustre: Skipped 1044 previous similar messages Jul 26 23:16:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 26 23:16:44 fir-md1-s1 kernel: Lustre: Skipped 718 previous similar messages Jul 26 23:21:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 23:21:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 26 23:21:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f193b81ec00, cur 1564208497 expire 1564208347 last 1564208270 Jul 26 23:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 26 23:25:04 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 26 23:26:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 23:26:09 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 26 23:26:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 26 23:26:54 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 26 23:35:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 26 23:35:16 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 26 23:36:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 23:36:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 26 23:36:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 23:36:51 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 26 23:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 26 23:37:13 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 26 23:45:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 26 23:45:28 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23672:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=2, svcEst=20, delay=7271 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23672:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 6 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f44e080a850 x1634531395013088/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:10/0 lens 488/440 e 0 to 0 dl 1564209970 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f0fb499ac50 x1635089925079424/t0(0) o4->ddc9790c-0eb3-6a50-110f-d17442bde73c@10.9.107.53@o2ib4:0/0 lens 2824/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 22430:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f0d74650450 x1638951213980560/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f0e612b9850 x1638934836362176/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 21291:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.13@o2ib4: deadline 6:2s ago req@ffff8f204dcdf850 x1638898251503856/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:10/0 lens 488/0 e 0 to 0 dl 1564209970 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 84 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 22430:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 4 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 21291:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 46554:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f16cf348050 x1638824537464032/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:10/0 lens 488/0 e 0 to 0 dl 1564209970 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 24584:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564209964/real 1564209964] req@ffff8f1a44f55700 x1636747218109952/t0(0) o104->fir-MDT0002@10.9.103.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564209971 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 24584:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 3 seconds Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 1, oc: 2, rc: 3 Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2f8fa5cc00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2e1fe6c800 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2f8fa5fa00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3ff9ca2a00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0b666c8a00 Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 23738:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.20.24@o2ib6 from 10.0.10.51@o2ib7 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f06270f1600 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3eb19f4400 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46594:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3ff9ca1400 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 24567:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0b5100d000 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46517:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec9b2ca00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 97599:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec9b2be00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 23106:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f143c66fe00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 22059:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0b666cf800 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 24564:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3980237800 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46555:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f35baac0a00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 25972:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f16cf222050 x1638824537463648/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:9/0 lens 488/440 e 0 to 0 dl 1564209969 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 25972:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 27581:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3ff9ca3000 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 21737:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0b666cd600 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7baec68-f8c8-0730-9508-ba1e77698953 (at 10.9.114.6@o2ib4), client will retry: rc -110 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46520:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2f8fa5be00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46535:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3ff9ca1200 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 42895:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1925f95600 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 21449:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3ff9ca4c00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 42894:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f23d477e000 Jul 26 23:46:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 3 seconds Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2ec9b2da00 Jul 26 23:46:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 32 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 46588:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3e6b172c00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f23d4778e00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 24569:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f143c668c00 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 21541:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3e6b170000 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0e65ff9200 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f06270f2200 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2ec9b2ea00 Jul 26 23:46:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: connected Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3ff9ca6400 Jul 26 23:46:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0b666c9e00 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=5, svcEst=20, delay=7763 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 11 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f37469b8300 x1634146366364656/t0(0) o101->cc4008f6-fb0a-3a63-7de5-6cb4e06911a9@10.9.101.44@o2ib4:9/0 lens 480/568 e 0 to 0 dl 1564209969 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:13 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 55555:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.21.35@o2ib6 from 10.0.10.51@o2ib7 Jul 26 23:46:13 fir-md1-s1 kernel: LNetError: 55555:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 27 previous similar messages Jul 26 23:46:15 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2f4888a450 x1638875236159632/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564209993 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 26 23:46:15 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 26 23:46:15 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 26 23:46:19 fir-md1-s1 kernel: Lustre: 23563:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564209972/real 1564209972] req@ffff8f138918b900 x1636747218109984/t0(0) o104->fir-MDT0002@10.9.103.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564209979 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 26 23:46:19 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:9s); client may timeout. req@ffff8f1db8691500 x1631572064143312/t0(0) o101->a687dd21-1bbe-233b-d907-3cc9986eac5f@10.9.103.28@o2ib4:10/0 lens 480/536 e 0 to 0 dl 1564209970 ref 1 fl Complete:/0/0 rc 0/0 Jul 26 23:46:19 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 26 previous similar messages Jul 26 23:46:19 fir-md1-s1 kernel: Lustre: 23563:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages Jul 26 23:46:28 fir-md1-s1 kernel: Lustre: 22988:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f371c303c50 x1638898251502880/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564209993 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:28 fir-md1-s1 kernel: Lustre: 22988:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 23:46:29 fir-md1-s1 kernel: Lustre: 22988:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f398364a050 x1637106128258624/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:4/0 lens 488/440 e 0 to 0 dl 1564209994 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:29 fir-md1-s1 kernel: Lustre: 22988:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Jul 26 23:46:33 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 21+0s req@ffff8f09e73fb450 x1639234188015088/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564209993 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c4566649-5001-d956-15cb-934d725d7f29 (at 10.9.113.11@o2ib4), client will retry: rc -110 Jul 26 23:46:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 26 23:46:33 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 35 previous similar messages Jul 26 23:46:33 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f20e09e1b00 x1640050594959184/t0(0) o101->bbaa1906-af49-bd8d-7e3e-fd864792512f@10.9.103.32@o2ib4:8/0 lens 480/568 e 1 to 0 dl 1564209998 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:33 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 43 previous similar messages Jul 26 23:46:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.41@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f3d1a60f2c0/0x5d9ee689d647afee lrc: 3/0,0 mode: PW/PW res: [0x200029e13:0x1dbf:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.9.101.41@o2ib4 remote: 0xa7aa4c6194318538 expref: 38 pid: 23631 timeout: 3325053 lvb_type: 0 Jul 26 23:46:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 26 23:46:33 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:24s); client may timeout. req@ffff8f2d0e11e900 x1634137606989248/t0(0) o101->05133d08-3c30-bc0b-3005-cf52634e4b28@10.9.101.47@o2ib4:9/0 lens 480/536 e 0 to 0 dl 1564209969 ref 1 fl Complete:/0/0 rc 0/0 Jul 26 23:46:37 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 22+3s req@ffff8f262e36a050 x1638887723512624/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:4/0 lens 488/440 e 0 to 0 dl 1564209994 ref 1 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:37 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 48 previous similar messages Jul 26 23:46:40 fir-md1-s1 kernel: Lustre: 21364:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:7s); client may timeout. req@ffff8f371c303c50 x1638898251502880/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564209993 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:46:40 fir-md1-s1 kernel: Lustre: 46542:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:7s); client may timeout. req@ffff8f40f7e68450 x1638870141194816/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564209993 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 26 23:46:40 fir-md1-s1 kernel: Lustre: 46542:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 26 23:46:40 fir-md1-s1 kernel: Lustre: 21364:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 26 23:46:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 26 23:46:41 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 26 23:46:43 fir-md1-s1 kernel: Lustre: 23705:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f411cb1b000 x1636451717164048/t0(0) o36->5580c86e-93fc-ec0b-7809-c452eedb4044@10.9.106.23@o2ib4:18/0 lens 536/2888 e 0 to 0 dl 1564210008 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:46:43 fir-md1-s1 kernel: Lustre: 23705:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 26 23:46:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.13@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ed66a7500/0x5d9ee689d61a56f3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c69a:0x1f1:0x0].0x0 bits 0x40/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.9.103.13@o2ib4 remote: 0x2718299e5b591da4 expref: 313 pid: 23758 timeout: 3325063 lvb_type: 0 Jul 26 23:46:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f206132af40/0x5d9ee689d60eac1c lrc: 3/0,0 mode: PW/PW res: [0x2c002c4cc:0x853:0x0].0x0 bits 0x40/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.9.103.35@o2ib4 remote: 0xd35d4f400bcc70e4 expref: 932 pid: 97647 timeout: 3325067 lvb_type: 0 Jul 26 23:46:47 fir-md1-s1 kernel: LustreError: 23731:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1b1caecc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f36187b4380/0x5d9ee689d647e015 lrc: 3/0,0 mode: --/PW res: [0x2c002c22a:0x4df0:0x0].0x0 bits 0x40/0x0 rrc: 18 type: IBT flags: 0x54a01400000020 nid: 10.9.103.35@o2ib4 remote: 0xd35d4f400bcc720a expref: 316 pid: 23731 timeout: 0 lvb_type: 0 Jul 26 23:46:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 52s: evicting client at 10.9.103.29@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1c0ee572c0/0x5d9ee689d64772bc lrc: 3/0,0 mode: PW/PW res: [0x2c002be94:0x23d4:0x0].0x0 bits 0x40/0x0 rrc: 26 type: IBT flags: 0x60200400000020 nid: 10.9.103.29@o2ib4 remote: 0x424e91e60b9222bb expref: 684 pid: 22007 timeout: 3325076 lvb_type: 0 Jul 26 23:46:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 26 23:46:57 fir-md1-s1 kernel: Lustre: 23563:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:46s); client may timeout. req@ffff8f1271c1d400 x1638853125315600/t0(0) o101->067c479d-9c5c-ba9a-1825-5f3ac7b0af53@10.9.103.23@o2ib4:10/0 lens 480/536 e 0 to 0 dl 1564209970 ref 1 fl Complete:/0/0 rc 0/0 Jul 26 23:46:57 fir-md1-s1 kernel: Lustre: 23563:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 26 23:46:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 26 23:46:57 fir-md1-s1 kernel: Lustre: Skipped 161 previous similar messages Jul 26 23:47:01 fir-md1-s1 kernel: Lustre: 23575:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f132e36e000 x1638091117043632/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:6/0 lens 544/2888 e 0 to 0 dl 1564210026 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:47:01 fir-md1-s1 kernel: Lustre: 23575:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 26 23:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9d9a34f1-f4e0-0f10-cc72-f899159f3999 (at 10.9.108.44@o2ib4) reconnecting Jul 26 23:47:16 fir-md1-s1 kernel: Lustre: Skipped 599 previous similar messages Jul 26 23:47:17 fir-md1-s1 kernel: LustreError: 21378:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f252baba800 ns: mdt-fir-MDT0000_UUID lock: ffff8f39e8ca9f80/0x5d9ee689d647df9e lrc: 3/0,0 mode: PW/PW res: [0x200029e13:0x1dbf:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x50200400000020 nid: 10.9.101.41@o2ib4 remote: 0xa7aa4c619431861f expref: 2 pid: 21378 timeout: 0 lvb_type: 0 Jul 26 23:47:17 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:43s); client may timeout. req@ffff8f3a41ad4e00 x1631691822125328/t0(0) o101->e891cc28-9c10-be1b-29fe-00592513d891@10.9.101.41@o2ib4:4/0 lens 480/536 e 0 to 0 dl 1564209994 ref 1 fl Complete:/0/0 rc -107/-107 Jul 26 23:47:17 fir-md1-s1 kernel: Lustre: 21378:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 26 23:47:25 fir-md1-s1 kernel: Lustre: 23621:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f418caa6f00 x1634137607698336/t0(0) o101->05133d08-3c30-bc0b-3005-cf52634e4b28@10.9.101.47@o2ib4:0/0 lens 480/568 e 0 to 0 dl 1564210050 ref 2 fl Interpret:/0/0 rc 0/0 Jul 26 23:47:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 26 23:47:56 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 26 23:48:30 fir-md1-s1 kernel: LustreError: 20554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564210020, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f39cd098000/0x5d9ee689d65a8f8b lrc: 3/0,1 mode: --/PW res: [0x200029e13:0x1dbf:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20554 timeout: 0 lvb_type: 0 Jul 26 23:48:47 fir-md1-s1 kernel: LustreError: 23727:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564210037, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f4021f0b3c0/0x5d9ee689d69975c2 lrc: 3/0,1 mode: --/PW res: [0x200029e13:0x1dbf:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23727 timeout: 0 lvb_type: 0 Jul 26 23:50:21 fir-md1-s1 kernel: LNet: Service thread pid 20554 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 26 23:50:21 fir-md1-s1 kernel: Pid: 20554, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 26 23:50:21 fir-md1-s1 kernel: Call Trace: Jul 26 23:50:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 26 23:50:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 26 23:50:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 26 23:50:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 26 23:50:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 26 23:50:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564210221.20554 Jul 26 23:50:38 fir-md1-s1 kernel: LNet: Service thread pid 23727 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 26 23:50:38 fir-md1-s1 kernel: Pid: 23727, comm: mdt03_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 26 23:50:38 fir-md1-s1 kernel: Call Trace: Jul 26 23:50:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 26 23:50:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 26 23:50:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 26 23:50:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 26 23:50:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 26 23:50:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564210238.23727 Jul 26 23:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 5705ebfa-7bbb-a9cf-a915-c93f79a93acf (at 10.9.101.44@o2ib4) Jul 26 23:55:34 fir-md1-s1 kernel: Lustre: Skipped 889 previous similar messages Jul 26 23:57:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 05133d08-3c30-bc0b-3005-cf52634e4b28 (at 10.9.101.47@o2ib4) reconnecting Jul 26 23:57:21 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 27 00:00:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 00:00:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 00:01:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bba48a400, cur 1564210869 expire 1564210719 last 1564210642 Jul 27 00:01:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.22@o2ib4, removing former export from same NID Jul 27 00:01:10 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 27 00:04:09 fir-md1-s1 kernel: LNet: Service thread pid 20554 completed after 1029.09s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 27 00:04:09 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Jul 27 00:05:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 00:05:43 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 27 00:07:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 00:07:31 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 27 00:13:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 00:13:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 00:13:37 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 00:15:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 00:15:47 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 14102:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=33 reqQ=0 recA=32, svcEst=1, delay=6333 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 14102:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 21388:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.10@o2ib4: deadline 6:1s ago req@ffff8f202fe2dc50 x1638929192136304/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:28/0 lens 488/0 e 0 to 0 dl 1564211818 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 46552:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.8@o2ib4: deadline 6:1s ago req@ffff8f1bac597c50 x1634531437157168/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:28/0 lens 488/0 e 0 to 0 dl 1564211818 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 27603:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1c08163c50 x1638929192136432/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:28/0 lens 488/0 e 0 to 0 dl 1564211818 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 21388:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 46552:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 27603:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 46564:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f1863e5b450 x1638887735762528/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:28/0 lens 488/0 e 0 to 0 dl 1564211818 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 46564:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (7): c: 2, oc: 0, rc: 7 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 22280:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f1e5fd62a00 x1638929192136352/t0(0) o101->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:0/0 lens 1768/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 22280:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 6 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 23643:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564211812/real 1564211812] req@ffff8f418ca2e300 x1636747234794528/t0(0) o104->fir-MDT0002@10.9.103.2@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564211819 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 14791:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f0ecb99e050 x1631630888462704/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 22975:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f280f594c50 x1631580550187056/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 14791:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 22975:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 18 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2506c7ae00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68e600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3eb19f1e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e6b172800 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f07289c00 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 21446:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.15.4@o2ib6 from 10.0.10.51@o2ib7 Jul 27 00:17:00 fir-md1-s1 kernel: LNetError: 21446:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 1 previous similar message Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68fe00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f132d627600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68ca00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0e1ae62a00 Jul 27 00:17:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3f0728ea00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e6b173c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3741a2ac00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f22caf97a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a82c5ae00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a82c5c800 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3f07288400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1803e58200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f20731eaa00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2a82c5c200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bacddb000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec9b2cc00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f132d627200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f07288c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44bc67e000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a08c4b800 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f22caf92200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1bc53a5c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980233a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f06270f1000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f132d621c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3eb19f7000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f203aa73400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a08c4c200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f23d477e000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f132d622e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d36600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980232e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f22caf94a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3ec9058400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a08c4de00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d36c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f079dcafa00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f342e0bf000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1a08c4f400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f23d4779e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2506c7b600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1bc53a6800 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f203aa76c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1925f90a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0e1ae66800 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bc53a2600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12f21bda00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bacdde000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10fee25c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f36bde59200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1803e5a000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec9b2a000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12f21bb400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3f0728a400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 21540:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1bacdddc00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 27481:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1bc53a2600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 21737:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec9b2be00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 27602:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3741a29400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 46568:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1925f97200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 46578:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1803e58200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 21535:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2a82c5c200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 22058:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2371e17e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f376c65c400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bacddc600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d34400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f342e0bf000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2506c7fe00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f143c66c600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f079dcafa00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a6cd87000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d30c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1925f90a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f143c66b000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a6b72ec00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d31600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44bc67e000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f077125aa00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06270f3e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c23f72200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec9b2ee00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1803e5a400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f079dcade00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e6b173c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06270f4200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06270f4600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38b0ae8200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3ec9058a00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2371e13400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b51008c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d36600 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2844709e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d35200 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3eb19f0e00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f0728a400 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f36bde5a000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1bacdddc00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11e6d31c00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68f000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 46527:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2cb14abc50 x1640013627970560/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=46 reqQ=0 recA=15, svcEst=20, delay=6165 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 12 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f31510f7050 x1638876417746080/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 00:17:00 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 138 previous similar messages Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68ba00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f284470e000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68ec00 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f336f68c000 Jul 27 00:17:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f336f68dc00 Jul 27 00:17:01 fir-md1-s1 kernel: Lustre: 20228:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564211819/real 1564211821] req@ffff8f322a8eec00 x1636747234794768/t0(0) o13->fir-OST0022-osc-MDT0002@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564211826 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 27 00:17:01 fir-md1-s1 kernel: Lustre: fir-OST001c-osc-MDT0002: Connection to fir-OST001c (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:17:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 00:17:01 fir-md1-s1 kernel: Lustre: 20228:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages Jul 27 00:17:02 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2523730450 x1631630752249232/t0(0) o4->96d38695-b12e-ad38-5a89-620c7e3a5eec@10.9.102.68@o2ib4:20/0 lens 504/448 e 1 to 0 dl 1564211840 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 96d38695-b12e-ad38-5a89-620c7e3a5eec (at 10.9.102.68@o2ib4), client will retry: rc = -110 Jul 27 00:17:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 00:17:04 fir-md1-s1 kernel: LustreError: 46571:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+6s req@ffff8f113ca03450 x1638085376992512/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bf0fab1f-ed86-800d-24d6-23f47310966d (at 10.9.113.8@o2ib4), client will retry: rc -110 Jul 27 00:17:04 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Jul 27 00:17:04 fir-md1-s1 kernel: Lustre: 21793:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f2cb14a9450 x1634531437156512/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 00:17:04 fir-md1-s1 kernel: Lustre: 21793:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 164 previous similar messages Jul 27 00:17:04 fir-md1-s1 kernel: LustreError: 24572:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2f331f0850 x1639959888200176/t0(0) o4->f514cc7a-9bbf-6a9c-dfda-7e21d4d17fbe@10.8.9.9@o2ib6:19/0 lens 488/448 e 1 to 0 dl 1564211839 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with f514cc7a-9bbf-6a9c-dfda-7e21d4d17fbe (at 10.8.9.9@o2ib6), client will retry: rc = -110 Jul 27 00:17:04 fir-md1-s1 kernel: LustreError: 46571:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 103 previous similar messages Jul 27 00:17:05 fir-md1-s1 kernel: LustreError: 6550:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+7s req@ffff8f34f0a0fc50 x1638905342940864/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:05 fir-md1-s1 kernel: LustreError: 21537:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+7s req@ffff8f2ae12d9050 x1639508510514720/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564211818 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:05 fir-md1-s1 kernel: LustreError: 21537:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 00:17:06 fir-md1-s1 kernel: Lustre: 20723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564211819/real 1564211819] req@ffff8f2679639500 x1636747234794496/t0(0) o104->fir-MDT0002@10.9.103.23@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564211826 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 00:17:06 fir-md1-s1 kernel: Lustre: 20723:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 27 00:17:14 fir-md1-s1 kernel: Lustre: 23565:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34f56eef00 x1638853125908752/t0(0) o101->067c479d-9c5c-ba9a-1825-5f3ac7b0af53@10.9.103.23@o2ib4:19/0 lens 480/568 e 1 to 0 dl 1564211839 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:14 fir-md1-s1 kernel: Lustre: 23565:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 00:17:19 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f363e632c50 x1638830540928128/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:19/0 lens 488/440 e 1 to 0 dl 1564211839 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 27 00:17:19 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 27 00:17:19 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 00:17:20 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0793a08300 x1631743193296800/t0(0) o101->dad5e408-d765-51d9-1659-bc9a52227289@10.9.103.30@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1564211845 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:20 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 27 00:17:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 34s: evicting client at 10.9.103.29@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f16f4e6a400/0x5d9ee689eeb49d67 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6b0:0x3:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.29@o2ib4 remote: 0x424e91e60b927cc6 expref: 501 pid: 97643 timeout: 3326906 lvb_type: 0 Jul 27 00:17:26 fir-md1-s1 kernel: Lustre: 21680:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:28s); client may timeout. req@ffff8f418ca2e900 x1638872292865232/t0(0) o101->191e7928-23a0-eccc-c908-3ef7952d34e9@10.9.103.35@o2ib4:28/0 lens 480/536 e 0 to 0 dl 1564211818 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 00:17:26 fir-md1-s1 kernel: Lustre: 21680:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Jul 27 00:17:31 fir-md1-s1 kernel: Lustre: 23604:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2fb8d69200 x1639323814907264/t0(0) o101->cdec277e-fcb0-d7ee-939c-c22853f65ec1@10.9.103.14@o2ib4:6/0 lens 480/568 e 0 to 0 dl 1564211856 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:17:31 fir-md1-s1 kernel: Lustre: 23604:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 00:17:31 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.33@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f14d7f94a40/0x5d9ee689eea2c714 lrc: 3/0,0 mode: PW/PW res: [0x2c002c22a:0x4e31:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.33@o2ib4 remote: 0x67df4151d22d5257 expref: 541 pid: 20738 timeout: 3326911 lvb_type: 0 Jul 27 00:17:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client da74a6ae-9e7c-db01-39f9-c8d7b66544b1 (at 10.9.101.19@o2ib4) reconnecting Jul 27 00:17:32 fir-md1-s1 kernel: Lustre: Skipped 1670 previous similar messages Jul 27 00:17:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.7@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f410fb960c0/0x5d9ee689eeb1d3fa lrc: 3/0,0 mode: PW/PW res: [0x2c002c501:0x8be:0x0].0x0 bits 0x40/0x0 rrc: 20 type: IBT flags: 0x60200400000020 nid: 10.9.103.7@o2ib4 remote: 0x2f1f4b1048e5dc58 expref: 531 pid: 23626 timeout: 3326915 lvb_type: 0 Jul 27 00:17:35 fir-md1-s1 kernel: LustreError: 23643:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f178379a000 ns: mdt-fir-MDT0002_UUID lock: ffff8f3656a886c0/0x5d9ee689eeb4c163 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6b2:0x54:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x50200400000020 nid: 10.9.103.33@o2ib4 remote: 0x67df4151d22d52e3 expref: 2 pid: 23643 timeout: 0 lvb_type: 0 Jul 27 00:17:49 fir-md1-s1 kernel: LustreError: 97656:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4518921800 ns: mdt-fir-MDT0002_UUID lock: ffff8f203323a1c0/0x5d9ee689eeb4c124 lrc: 3/0,0 mode: PW/PW res: [0x2c002c4cd:0x500:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x50200400000020 nid: 10.9.103.7@o2ib4 remote: 0x2f1f4b1048e5dc7b expref: 2 pid: 97656 timeout: 0 lvb_type: 0 Jul 27 00:17:49 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:51s); client may timeout. req@ffff8f1f7149a700 x1631563734678544/t0(0) o101->492585ec-a5aa-1ba0-19ca-f69975156f5c@10.9.103.7@o2ib4:28/0 lens 480/536 e 0 to 0 dl 1564211818 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 00:17:49 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 00:18:30 fir-md1-s1 kernel: LustreError: 10195:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564211820, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0a02a15c40/0x5d9ee689eeb63145 lrc: 3/0,1 mode: --/PW res: [0x2c002c6b2:0x54:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 10195 timeout: 0 lvb_type: 0 Jul 27 00:18:35 fir-md1-s1 kernel: LustreError: 23708:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564211825, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f125f7598c0/0x5d9ee689eeb9140b lrc: 3/0,1 mode: --/PW res: [0x2c002c6b2:0x54:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23708 timeout: 0 lvb_type: 0 Jul 27 00:18:37 fir-md1-s1 kernel: LustreError: 23587:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564211827, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3d0ebc86c0/0x5d9ee689eeb9d4c3 lrc: 3/0,1 mode: --/PW res: [0x2c002c6b2:0x54:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23587 timeout: 0 lvb_type: 0 Jul 27 00:18:37 fir-md1-s1 kernel: LustreError: 23587:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 27 00:20:21 fir-md1-s1 kernel: LNet: Service thread pid 10195 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 00:20:21 fir-md1-s1 kernel: Pid: 10195, comm: mdt00_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 00:20:21 fir-md1-s1 kernel: Call Trace: Jul 27 00:20:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 00:20:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 00:20:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 00:20:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 00:20:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 00:20:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564212021.10195 Jul 27 00:20:25 fir-md1-s1 kernel: LNet: Service thread pid 23708 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 00:20:25 fir-md1-s1 kernel: Pid: 23708, comm: mdt00_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 00:20:25 fir-md1-s1 kernel: Call Trace: Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 00:20:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 00:20:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 00:20:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564212025.23708 Jul 27 00:20:25 fir-md1-s1 kernel: Pid: 20541, comm: mdt00_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 00:20:25 fir-md1-s1 kernel: Call Trace: Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 00:20:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 00:20:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 00:20:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 00:20:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 00:20:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 00:20:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 00:20:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564212026.20541 Jul 27 00:20:27 fir-md1-s1 kernel: LNet: Service thread pid 23587 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 00:20:27 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 27 00:20:27 fir-md1-s1 kernel: Pid: 23587, comm: mdt03_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 00:20:27 fir-md1-s1 kernel: Call Trace: Jul 27 00:20:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 00:20:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 00:20:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 00:20:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 00:20:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 00:20:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564212027.23587 Jul 27 00:21:10 fir-md1-s1 kernel: LNet: Service thread pid 10195 completed after 249.69s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 27 00:21:10 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Jul 27 00:23:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 00:23:56 fir-md1-s1 kernel: Lustre: Skipped 860 previous similar messages Jul 27 00:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 00:26:02 fir-md1-s1 kernel: Lustre: Skipped 2724 previous similar messages Jul 27 00:26:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 00:26:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 00:27:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 00:27:39 fir-md1-s1 kernel: Lustre: Skipped 194 previous similar messages Jul 27 00:34:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 00:34:12 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 27 00:36:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 00:36:14 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 00:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 00:38:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 00:39:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 00:39:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 00:44:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 00:44:16 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 00:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 00:46:35 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Jul 27 00:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 00:48:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 27 00:55:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 00:55:36 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 00:56:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 00:56:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 00:56:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 00:56:44 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 00:57:24 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 00:57:24 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 4 previous similar messages Jul 27 00:57:24 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (6): c: 3, oc: 0, rc: 6 Jul 27 00:57:24 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 4 previous similar messages Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: 20239:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564214239/real 0] req@ffff8f41c7bd2400 x1636747256288688/t0(0) o13->fir-OST000f-osc-MDT0002@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564214246 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: fir-OST0023-osc-MDT0002: Connection to fir-OST0023 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: 20239:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564214239/real 0] req@ffff8f41c7bd3600 x1636747256288704/t0(0) o13->fir-OST0024-osc-MDT0000@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564214246 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: fir-OST0024-osc-MDT0000: Connection to fir-OST0024 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:57:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 46527:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=11 reqQ=0 recA=57, svcEst=20, delay=7514 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 24570:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.15@o2ib4: deadline 6:2s ago req@ffff8f2f1ee7b450 x1638885575604880/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:24/0 lens 488/0 e 0 to 0 dl 1564214244 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 24570:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 24 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1043280050 x1638951381146000/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:24/0 lens 488/0 e 0 to 0 dl 1564214244 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f2f1ee7b450 x1638885575604880/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:24/0 lens 488/0 e 0 to 0 dl 1564214244 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 27443:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f0d4a9f5400 x1631550658553344/t0(0) o103->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 27443:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 16 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f16e26d2850 x1638830600027360/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564214244 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1f9217ea00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb4e00 Jul 27 00:57:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 3 seconds Jul 27 00:57:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 27 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 21829:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.12@o2ib6 from 10.0.10.51@o2ib7 Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 21829:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 16 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f26096e4e00 Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 9 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.105@o2ib7 (0): c: 0, oc: 1, rc: 8 Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 9 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21987:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0fc585f800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f203aa71400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21794:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0fc585ce00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3710c2b600 Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 68 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20731e8000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f20731ebc00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f14b622d400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3ec905f400 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e3b865400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb4c00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb3e00 Jul 27 00:57:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.210@o2ib7: accepting Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f14b622d200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21792:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2af262f600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20731ea800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3e47aee000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec9059e00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20731eea00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec9b2ca00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e3b867200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905c800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905fe00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a616c8000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20731e9600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f08d0d08000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b5100ce00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f08d0d0b400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec9059400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905e000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905b200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3fc7aaa000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 6548:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2af262ec00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21390:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f20731e9600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 44034:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3ff9ca5400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21448:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1925f94400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46513:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f143c66d200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46564:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2506c7b800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905b400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905f600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb2200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3fc7aaae00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3ec905a400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f350fb8f200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3fc7aa8a00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2601eb2400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3fc7aaa400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 22650:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f20731eea00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46516:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2601eb1c00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 22431:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3c23f72e00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21617:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f40cd35a800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 57558:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1b6031d400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46523:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec9b2f600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 22058:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f312fb4c800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 24571:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2601eb4000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21498:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f138aaea000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21284:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f41595b5a00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21567:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1b6031ea00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 42894:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f351e039200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46594:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f203aa76000 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46551:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1f9217d400 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21450:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21450:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.8.11.11@o2ib6 x1632261596459536 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 23107:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f26096e6600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46531:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f351e03f800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46518:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f35baac6e00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 22670:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2601eb7c00 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46546:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f41595b2200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46525:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f312fb49200 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 22059:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f26096e6800 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 21682:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f14b622c600 Jul 27 00:57:28 fir-md1-s1 kernel: LustreError: 46570:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f08d0d08a00 Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 23746:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.27.2@o2ib6 from Jul 27 00:57:28 fir-md1-s1 kernel: LNetError: 23746:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 66536 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 20248:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564214239/real 1564214246] req@ffff8f41c7bd6600 x1636747256288848/t0(0) o13->fir-OST001e-osc-MDT0000@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564214246 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 20248:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 73806 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: fir-OST001e-osc-MDT0000: Connection to fir-OST001e (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=11 reqQ=0 recA=25, svcEst=20, delay=7486 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 15 previous similar messages Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f16e26dc450 x1631305348220192/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564214244 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 00:57:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 57 previous similar messages Jul 27 00:57:30 fir-md1-s1 kernel: LustreError: 65760:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f434a915850 x1638898404219472/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564214268 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4), client will retry: rc -110 Jul 27 00:57:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 00:57:30 fir-md1-s1 kernel: LustreError: 65760:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564214244/real 1564214246] req@ffff8f364feaf800 x1636747256289248/t0(0) o13->fir-OST0020-osc-MDT0002@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564214251 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0002: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 00:57:31 fir-md1-s1 kernel: LustreError: 46514:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+7s req@ffff8f2c7a3c7450 x1638085467273024/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564214244 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:31 fir-md1-s1 kernel: LustreError: 46514:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 22 previous similar messages Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: 46514:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f2c7a3c7450 x1638085467273024/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564214244 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 00:57:31 fir-md1-s1 kernel: Lustre: 46514:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 62 previous similar messages Jul 27 00:57:33 fir-md1-s1 kernel: Lustre: 46583:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3b7ce75050 x1638875347377872/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564214258 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:33 fir-md1-s1 kernel: Lustre: 46583:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 00:57:38 fir-md1-s1 kernel: LustreError: 66901:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f3dffc0dc50 x1638905436782448/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564214258 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 27 00:57:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 00:57:38 fir-md1-s1 kernel: LustreError: 66901:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 00:57:40 fir-md1-s1 kernel: Lustre: 22433:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f434a915450 x1638824616257168/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564214258 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 00:57:40 fir-md1-s1 kernel: Lustre: 22433:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 00:57:42 fir-md1-s1 kernel: Lustre: 21411:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0bd5945100 x1638952719829760/t0(0) o101->e4dece5a-12cf-6038-cf46-eb184afeeaa8@10.9.103.4@o2ib4:17/0 lens 480/568 e 1 to 0 dl 1564214267 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:42 fir-md1-s1 kernel: Lustre: 21411:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 27 00:57:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2891f94140/0x5d9ee689e13757e2 lrc: 3/0,0 mode: PR/PR res: [0x2c002c6a6:0xc8ec:0x0].0x0 bits 0x58/0x0 rrc: 3 type: IBT flags: 0x60200400010020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3ec1f11d8c expref: 7000 pid: 50584 timeout: 3329327 lvb_type: 0 Jul 27 00:57:47 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Jul 27 00:57:48 fir-md1-s1 kernel: LustreError: 46548:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+0s req@ffff8f4517172050 x1638875347378000/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564214268 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:48 fir-md1-s1 kernel: LustreError: 46548:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 00:57:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 27 00:57:48 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 00:57:50 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2c7a3c6450 x1638929277378480/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564214268 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:50 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 00:57:51 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f268efd1850 x1639958520286112/t0(0) o3->f111b25a-6d2a-16a8-5df8-392d9e810365@10.8.15.4@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1564214276 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:51 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 107 previous similar messages Jul 27 00:57:56 fir-md1-s1 kernel: LustreError: 22059:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f268efd1850 x1639958520286112/t0(0) o3->f111b25a-6d2a-16a8-5df8-392d9e810365@10.8.15.4@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1564214276 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 00:57:56 fir-md1-s1 kernel: LustreError: 22059:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 92 previous similar messages Jul 27 00:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f111b25a-6d2a-16a8-5df8-392d9e810365 (at 10.8.15.4@o2ib6), client will retry: rc -110 Jul 27 00:57:56 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 27 00:58:05 fir-md1-s1 kernel: Lustre: 23670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-4), not sending early reply req@ffff8f3f4e8e4800 x1631563735323504/t0(0) o101->492585ec-a5aa-1ba0-19ca-f69975156f5c@10.9.103.7@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1564214290 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 00:58:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.103.18@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0df6579680/0x5d9ee68a0f17af89 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6b7:0x16:0x0].0x0 bits 0x40/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.9.103.18@o2ib4 remote: 0x2dc8d25708a3143f expref: 600 pid: 21419 timeout: 3329343 lvb_type: 0 Jul 27 00:58:08 fir-md1-s1 kernel: Lustre: 23753:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:44s); client may timeout. req@ffff8f2e1fc70c00 x1631538595157488/t0(0) o101->3b7f9e8e-0cc2-e0b1-ed46-2872567345ed@10.9.103.29@o2ib4:24/0 lens 480/536 e 0 to 0 dl 1564214244 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 00:58:08 fir-md1-s1 kernel: Lustre: 23753:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Jul 27 00:58:09 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f250bed8400 ns: mdt-fir-MDT0000_UUID lock: ffff8f34bbac1440/0x5d9ee68a0f17c9d7 lrc: 3/0,0 mode: PW/PW res: [0x20002976c:0x864b:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x50200000000000 nid: 10.8.27.2@o2ib6 remote: 0x5cc274adaa6b6c00 expref: 269 pid: 23746 timeout: 0 lvb_type: 0 Jul 27 00:58:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.1@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f27fd546e40/0x5d9ee68a0ef053b8 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6b7:0x5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.1@o2ib4 remote: 0x97b87152cf7d923 expref: 369 pid: 23575 timeout: 3329350 lvb_type: 0 Jul 27 00:58:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 27 00:58:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9101e47c-5087-9ebf-bb20-6ff2bf817bf0 (at 10.9.101.32@o2ib4) reconnecting Jul 27 00:58:30 fir-md1-s1 kernel: Lustre: Skipped 708 previous similar messages Jul 27 01:05:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 01:05:58 fir-md1-s1 kernel: Lustre: Skipped 343 previous similar messages Jul 27 01:06:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 01:06:50 fir-md1-s1 kernel: Lustre: Skipped 1096 previous similar messages Jul 27 01:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 01:08:43 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 27 01:12:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 01:12:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 01:17:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 01:17:08 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 01:18:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 01:18:13 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 27 01:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 01:18:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 01:24:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f6da02800, cur 1564215851 expire 1564215701 last 1564215624 Jul 27 01:25:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 01:25:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 01:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 01:27:10 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 27 01:28:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 01:28:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 01:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 01:29:42 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 01:37:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 01:37:20 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 01:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 01:39:45 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 27 01:40:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 01:40:39 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 01:43:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 01:43:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 01:47:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 01:47:25 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 01:50:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 01:50:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 01:51:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 01:51:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 01:53:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 01:53:50 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 01:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 01:57:36 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 02:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 02:01:06 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 02:02:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 02:02:51 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 27 02:07:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 02:07:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 02:07:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 02:07:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 27 02:09:14 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0d24c25450 x1631550676151584/t0(0) o3->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:19/0 lens 488/16824 e 1 to 0 dl 1564218559 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 02:09:35 fir-md1-s1 kernel: Lustre: 57787:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:16s); client may timeout. req@ffff8f0d24c25450 x1631550676151584/t0(0) o3->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:19/0 lens 488/16792 e 1 to 0 dl 1564218559 ref 1 fl Complete:/0/0 rc 16384/16384 Jul 27 02:09:35 fir-md1-s1 kernel: Lustre: 57787:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 02:11:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 02:11:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 02:13:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 27 02:13:56 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 02:18:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 02:18:13 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 27 02:21:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 02:21:30 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 02:21:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 02:21:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 02:25:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 02:25:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 02:28:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 02:28:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 02:31:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 02:31:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 02:34:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 02:34:16 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 27 02:35:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 02:35:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 02:37:21 fir-md1-s1 kernel: Lustre: 24576:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f16eb8e5100 x1638935119133664/t0(0) o101->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:0/0 lens 1768/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 21737:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f2793cb3850 x1638830729221680/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564220241 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:21 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 02:37:21 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44bc67f400 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1bacdde800 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2601eb7a00 Jul 27 02:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1803e59800 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0f20cdd200 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1bacddd000 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1b6031e200 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0f20cdb000 Jul 27 02:37:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f20dcde7600 Jul 27 02:37:21 fir-md1-s1 kernel: Lustre: 24576:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 32 previous similar messages Jul 27 02:37:22 fir-md1-s1 kernel: LustreError: 46542:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3fc893bc50 x1639155186279664/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564220255 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:22 fir-md1-s1 kernel: LustreError: 46542:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 02:37:23 fir-md1-s1 kernel: LustreError: 25632:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1647d24050 x1638935119133264/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564220265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7baec68-f8c8-0730-9508-ba1e77698953 (at 10.9.114.6@o2ib4), client will retry: rc -110 Jul 27 02:37:23 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 02:37:23 fir-md1-s1 kernel: LustreError: 25632:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 47 previous similar messages Jul 27 02:37:24 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2793cb3c50 x1638870347960608/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564220265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:24 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 27 02:37:28 fir-md1-s1 kernel: LustreError: 21712:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1647d26850 x1639234579330304/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564220265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 27 02:37:28 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 27 02:37:28 fir-md1-s1 kernel: LustreError: 21712:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 27 02:37:30 fir-md1-s1 kernel: Lustre: 24569:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2dd2981850 x1631631195219152/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564220255 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:30 fir-md1-s1 kernel: Lustre: 46586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3fc893cc50 x1631631195219520/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564220255 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:30 fir-md1-s1 kernel: Lustre: 46586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 02:37:35 fir-md1-s1 kernel: LustreError: 23107:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+0s req@ffff8f2dd2981850 x1631631195219152/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564220255 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:35 fir-md1-s1 kernel: LustreError: 23107:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 02:37:37 fir-md1-s1 kernel: Lustre: 14790:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0565669850 x1638892474826464/t0(0) o4->2f18c3b0-076e-4e2d-5fbb-f0a683181101@10.9.114.3@o2ib4:11/0 lens 504/448 e 1 to 0 dl 1564220261 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:37 fir-md1-s1 kernel: Lustre: 14790:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 02:37:37 fir-md1-s1 kernel: LustreError: 20508:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3fc893c450 x1631631195219808/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564220265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 27 02:37:37 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 27 02:37:37 fir-md1-s1 kernel: LustreError: 20508:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 27 02:37:41 fir-md1-s1 kernel: LustreError: 46570:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f0565669850 x1638892474826464/t0(0) o4->2f18c3b0-076e-4e2d-5fbb-f0a683181101@10.9.114.3@o2ib4:11/0 lens 504/448 e 1 to 0 dl 1564220261 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 02:37:41 fir-md1-s1 kernel: LustreError: 46570:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 02:37:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2f18c3b0-076e-4e2d-5fbb-f0a683181101 (at 10.9.114.3@o2ib4), client will retry: rc = -110 Jul 27 02:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 02:38:40 fir-md1-s1 kernel: Lustre: Skipped 182 previous similar messages Jul 27 02:42:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 02:42:02 fir-md1-s1 kernel: Lustre: Skipped 125 previous similar messages Jul 27 02:45:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 02:45:17 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 02:45:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 02:45:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 02:48:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 02:48:49 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 27 02:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 02:52:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 02:55:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 02:55:31 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 27 02:55:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 02:55:52 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 02:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28c2c24800, cur 1564221387 expire 1564221237 last 1564221160 Jul 27 02:58:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 02:58:53 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 03:02:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 03:02:16 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 03:05:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 03:05:33 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 03:07:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 03:07:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 03:09:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de8e62800, cur 1564222151 expire 1564222001 last 1564221924 Jul 27 03:09:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 03:09:19 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 27 03:10:12 fir-md1-s1 kernel: Lustre: 46571:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f069e743050 x1638830778131744/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564222217 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 03:13:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 03:13:10 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 03:15:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 03:15:33 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 27 03:18:00 fir-md1-s1 kernel: Lustre: 23093:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1246890050 x1638929527140496/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564222685 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 03:18:06 fir-md1-s1 kernel: LustreError: 21545:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1246890050 x1638929527140496/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564222685 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 03:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -107 Jul 27 03:18:06 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 03:18:06 fir-md1-s1 kernel: Lustre: 21545:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f1246890050 x1638929527140496/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564222685 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 27 03:19:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 03:19:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 03:19:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 03:19:38 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 27 03:23:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 03:23:37 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 03:25:38 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 27 03:25:38 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 27 03:25:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 03:25:46 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 27 03:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 03:30:06 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 03:33:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 03:33:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 03:38:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 03:38:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 03:39:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 03:39:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 03:40:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 03:40:34 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 27 03:44:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 03:44:15 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 03:49:25 fir-md1-s1 kernel: Lustre: 70067:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0dbce18850 x1631631381020240/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 488/440 e 1 to 0 dl 1564224570 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 03:49:36 fir-md1-s1 kernel: Lustre: 18782:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f0dbce18850 x1631631381020240/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 488/408 e 1 to 0 dl 1564224570 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 27 03:50:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 03:50:50 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 27 03:50:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 03:50:50 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 03:51:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 03:53:40 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a04416050 x1638898745545200/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1564224825 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 03:53:51 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0a04416050 x1638898745545200/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1564224825 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 03:53:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4), client will retry: rc -107 Jul 27 03:53:51 fir-md1-s1 kernel: Lustre: 57787:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f0a04416050 x1638898745545200/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1564224825 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 27 03:54:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 03:54:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 04:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 04:00:52 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 27 04:00:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 04:00:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 27 04:01:57 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 27 04:02:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 04:02:57 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 27 04:04:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 04:04:46 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 04:10:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 04:10:54 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 27 04:12:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 04:12:31 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 27 04:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 04:15:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 04:19:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 04:19:13 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 04:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd7772a0-5656-7b9e-2b19-3f87efa63ec1 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36ff24c400, cur 1564226424 expire 1564226274 last 1564226197 Jul 27 04:21:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 04:21:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 04:23:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 27 04:23:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 27 04:25:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 04:25:15 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 04:29:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 04:29:45 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 04:31:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 04:31:23 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 27 04:33:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 04:33:09 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 04:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 04:35:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 04:41:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 04:41:29 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 27 04:41:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 04:41:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 04:43:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 04:43:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 04:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 04:45:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 04:51:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 04:51:32 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 04:53:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 04:53:29 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 04:53:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 04:53:38 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 04:55:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 04:55:33 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 05:01:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 05:01:37 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 27 05:01:43 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228895/real 1564228895] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228902 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 05:01:43 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 27 05:01:50 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228903/real 1564228903] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228910 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:01:57 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228910/real 1564228910] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228917 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:02:00 fir-md1-s1 kernel: Lustre: 22281:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f19e6159e00 x1638749272900384/t0(0) o101->957c1ad0-d547-b44d-0f14-5f92c3213a3d@10.8.15.3@o2ib6:5/0 lens 480/568 e 0 to 0 dl 1564228925 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:02:04 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228917/real 1564228917] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228924 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:02:11 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228924/real 1564228924] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228931 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:02:25 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228938/real 1564228938] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228945 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:02:25 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 27 05:02:46 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228959/real 1564228959] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564228966 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:02:46 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 27 05:03:21 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564228994/real 1564228994] req@ffff8f1e06a7dd00 x1636747385138080/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564229001 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 05:03:21 fir-md1-s1 kernel: Lustre: 24578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 27 05:04:03 fir-md1-s1 kernel: LustreError: 24578:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from glimpse AST (req@ffff8f1e06a7dd00 x1636747385138080 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2530752d00/0x5d9ee68aff6262bc lrc: 4/0,0 mode: PW/PW res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0x1142d770e8b6ac8d expref: 52 pid: 24580 timeout: 0 lvb_type: 0 Jul 27 05:04:03 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Jul 27 05:04:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 05:04:03 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 198s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2530752d00/0x5d9ee68aff6262bc lrc: 4/0,0 mode: PW/PW res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0x1142d770e8b6ac8d expref: 53 pid: 24580 timeout: 0 lvb_type: 0 Jul 27 05:04:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9e45188c-99b3-1fcf-f3a1-bc8544f2d813 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f183f238800, cur 1564229081 expire 1564228931 last 1564228854 Jul 27 05:04:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:05:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 05:05:10 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 05:05:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 05:05:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 05:06:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 05:06:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (5): c: 2, oc: 0, rc: 8 Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 23753:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.20.27@o2ib6 from 10.0.10.51@o2ib7 Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 23753:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 7302 previous similar messages Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.1.32@o2ib6 from 10.0.10.51@o2ib7 Jul 27 05:09:22 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 28 previous similar messages Jul 27 05:09:23 fir-md1-s1 kernel: LustreError: 21041:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f302ce25800 Jul 27 05:09:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 05:09:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 27 05:09:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (6): c: 2, oc: 0, rc: 8 Jul 27 05:09:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 27 05:09:24 fir-md1-s1 kernel: LNetError: 24564:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.11.28@o2ib6 from 10.0.10.51@o2ib7 Jul 27 05:09:24 fir-md1-s1 kernel: LNetError: 24564:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 16 previous similar messages Jul 27 05:09:24 fir-md1-s1 kernel: Lustre: 20242:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564229357/real 0] req@ffff8f3e790bc500 x1636747391159008/t0(0) o13->fir-OST0028-osc-MDT0002@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564229364 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 05:09:24 fir-md1-s1 kernel: Lustre: 20242:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 27 05:09:24 fir-md1-s1 kernel: Lustre: fir-OST0028-osc-MDT0002: Connection to fir-OST0028 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:09:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 05:09:24 fir-md1-s1 kernel: LustreError: 46551:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3f073b1000 Jul 27 05:09:25 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f28b7b30850 x1639155485592352/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:25 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 05:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 27 05:09:25 fir-md1-s1 kernel: LustreError: 46510:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3f073b2000 Jul 27 05:09:28 fir-md1-s1 kernel: Lustre: fir-OST0024-osc-MDT0002: Connection to fir-OST0024 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:09:28 fir-md1-s1 kernel: Lustre: 20571:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f094b5f0300 x1631561499779920/t0(0) o400->dedbe9ee-8903-d6b4-bf80-d42c33abfec1@10.9.108.57@o2ib4:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 05:09:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 05:09:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.104@o2ib7 (11): c: 0, oc: 0, rc: 8 Jul 27 05:09:28 fir-md1-s1 kernel: LNetError: 21535:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.29.8@o2ib6 from 10.0.10.51@o2ib7 Jul 27 05:09:28 fir-md1-s1 kernel: LNetError: 21535:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 147 previous similar messages Jul 27 05:09:28 fir-md1-s1 kernel: LustreError: 13135:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f09a5655850 x1638824876559216/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 59f5c312-adc4-b4a9-05e0-8c37d188c47f (at 10.9.112.13@o2ib4), client will retry: rc -110 Jul 27 05:09:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2a6cd80c00 Jul 27 05:09:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0ba65e6000 Jul 27 05:09:28 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 05:09:28 fir-md1-s1 kernel: LustreError: 25997:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1e8540e050 x1639155485592128/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.104@o2ib7: 1 seconds Jul 27 05:09:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 21 previous similar messages Jul 27 05:09:28 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f351e04ec00 Jul 27 05:09:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d8d6f8e7-a2cd-08f2-c263-fa8b0dbeef3c (at 10.8.8.2@o2ib6), client will retry: rc = -110 Jul 27 05:09:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: fir-OST0013-osc-MDT0000: Connection to fir-OST0013 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 9c7adb50-64f1-6d92-d619-cdf901757223 (at 10.9.108.11@o2ib4), client will retry: rc = -110 Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 9c7adb50-64f1-6d92-d619-cdf901757223 (at 10.9.108.11@o2ib4), client will retry: rc = -110 Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4), client will retry: rc -110 Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:09:29 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 05:09:30 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f236a3f9850 x1636579155160592/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:30 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Jul 27 05:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 98c9ba85-8fd3-45fc-0f1e-4163d0960e95 (at 10.8.7.13@o2ib6), client will retry: rc = -110 Jul 27 05:09:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:09:31 fir-md1-s1 kernel: Lustre: 23732:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f38ab286000 x1631654823720144/t0(0) o101->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:6/0 lens 480/568 e 1 to 0 dl 1564229376 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:32 fir-md1-s1 kernel: Lustre: 22958:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26777bd050 x1640014110874352/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:34 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3ea5cb6050 x1638951944248336/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564229398 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:34 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 27 05:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 05:09:34 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 05:09:34 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f25c5e3cc50 x1635098045754256/t0(0) o4->f0500fad-d6f6-55b9-90d1-85c7444ded54@10.8.1.10@o2ib6:9/0 lens 504/448 e 1 to 0 dl 1564229379 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:34 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 27 05:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Jul 27 05:09:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 05:09:37 fir-md1-s1 kernel: LustreError: 66901:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 9+0s req@ffff8f361668fc50 x1638876870906944/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 27 05:09:39 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 05:09:39 fir-md1-s1 kernel: LustreError: 24570:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+2s req@ffff8f305e81e850 x1633909513783568/t0(0) o3->c534882d-6030-1b8a-8c54-b433ef117432@10.9.108.56@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:39 fir-md1-s1 kernel: LustreError: 24570:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 05:09:39 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f305e81e850 x1633909513783568/t0(0) o3->c534882d-6030-1b8a-8c54-b433ef117432@10.9.108.56@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:09:40 fir-md1-s1 kernel: Lustre: 21037:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f305e818c50 x1631305874798736/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564229377 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:09:40 fir-md1-s1 kernel: Lustre: 21037:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 05:09:43 fir-md1-s1 kernel: Lustre: 21565:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3ea5373050 x1631548015610896/t0(0) o4->362621d0-7ac3-9c5b-280e-e0d76da4f0b2@10.9.106.66@o2ib4:18/0 lens 504/448 e 1 to 0 dl 1564229388 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:43 fir-md1-s1 kernel: Lustre: 21565:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 05:09:46 fir-md1-s1 kernel: LustreError: 20510:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3eb7b5f450 x1638824876559904/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564229398 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:46 fir-md1-s1 kernel: LustreError: 20510:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 40 previous similar messages Jul 27 05:09:48 fir-md1-s1 kernel: LustreError: 66902:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f3ea5373050 x1631548015610896/t0(0) o4->362621d0-7ac3-9c5b-280e-e0d76da4f0b2@10.9.106.66@o2ib4:18/0 lens 504/448 e 1 to 0 dl 1564229388 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with fba95cb2-7f99-3e67-d120-dcab27657fbe (at 10.9.106.8@o2ib4), client will retry: rc = -110 Jul 27 05:09:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 05:09:48 fir-md1-s1 kernel: LustreError: 66902:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 27 05:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 27 05:09:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 05:09:51 fir-md1-s1 kernel: Lustre: 23667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-15), not sending early reply req@ffff8f38ab285100 x1638966183363200/t0(0) o101->6a159b93-cbcb-a910-1e2c-6484b2bca678@10.9.103.18@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1564229396 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:51 fir-md1-s1 kernel: Lustre: 23667:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 27 05:09:58 fir-md1-s1 kernel: LustreError: 46550:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f3fbf337450 x1634824943890912/t0(0) o4->70f16ef0-37e4-342c-34bd-1b5d3ec8a621@10.9.112.1@o2ib4:28/0 lens 488/448 e 0 to 0 dl 1564229398 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:09:58 fir-md1-s1 kernel: LustreError: 46550:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 05:10:12 fir-md1-s1 kernel: Lustre: 27321:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f07445b8f00 x1638883620885280/t0(0) o101->3411ffac-482d-1535-c486-9206f14b07f9@10.9.103.6@o2ib4:17/0 lens 480/568 e 0 to 0 dl 1564229417 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:10:12 fir-md1-s1 kernel: Lustre: 27321:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 27 05:10:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 44s: evicting client at 10.9.103.26@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f397b7b06c0/0x5d9ee68b06fcbdf4 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6b7:0x8a:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.9.103.26@o2ib4 remote: 0xca3655998a87ad18 expref: 550 pid: 23592 timeout: 3344457 lvb_type: 0 Jul 27 05:10:13 fir-md1-s1 kernel: LustreError: 23636:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f148b1e5400 ns: mdt-fir-MDT0002_UUID lock: ffff8f440b672400/0x5d9ee68b0708079d lrc: 3/0,0 mode: PW/PW res: [0x2c002c494:0xb70:0x0].0x0 bits 0x40/0x0 rrc: 26 type: IBT flags: 0x50200400000020 nid: 10.9.103.15@o2ib4 remote: 0xe8bbe65a622cc230 expref: 423 pid: 23636 timeout: 0 lvb_type: 0 Jul 27 05:10:13 fir-md1-s1 kernel: Lustre: 23640:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:36s); client may timeout. req@ffff8f38ab286000 x1631654823720144/t0(0) o101->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:6/0 lens 480/536 e 1 to 0 dl 1564229376 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 05:10:13 fir-md1-s1 kernel: LustreError: 23636:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 27 05:11:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 05:11:43 fir-md1-s1 kernel: Lustre: Skipped 1757 previous similar messages Jul 27 05:16:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 05:16:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 05:16:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 05:16:59 fir-md1-s1 kernel: Lustre: Skipped 1095 previous similar messages Jul 27 05:18:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 05:18:12 fir-md1-s1 kernel: Lustre: Skipped 645 previous similar messages Jul 27 05:22:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 05:22:54 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 27 05:27:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 05:27:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 05:28:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 05:28:44 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 05:32:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 05:32:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 05:32:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 05:32:54 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 27 05:37:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 05:37:29 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 05:39:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 27 05:39:33 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 27 05:42:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 05:42:43 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 05:42:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 05:42:55 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 27 05:45:01 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564231494/real 1564231494] req@ffff8f3969f04e00 x1636747412754192/t0(0) o13->fir-OST0013-osc-MDT0000@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564231501 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 05:45:01 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 32 previous similar messages Jul 27 05:45:01 fir-md1-s1 kernel: Lustre: fir-OST0013-osc-MDT0000: Connection to fir-OST0013 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:01 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 05:45:02 fir-md1-s1 kernel: LustreError: 21364:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f44d13f5450 x1636579209298368/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564231523 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:02 fir-md1-s1 kernel: LustreError: 21364:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 27 05:45:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 42f49237-eaa5-3549-e9cf-6b0ef8d87e1a (at 10.9.112.7@o2ib4), client will retry: rc -110 Jul 27 05:45:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 05:45:03 fir-md1-s1 kernel: Lustre: fir-OST001b-osc-MDT0000: Connection to fir-OST001b (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:03 fir-md1-s1 kernel: Lustre: fir-OST0002-osc-MDT0002: Connection to fir-OST0002 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:04 fir-md1-s1 kernel: Lustre: fir-OST0004-osc-MDT0002: Connection to fir-OST0004 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:04 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 46588:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=19 reqQ=0 recA=46, svcEst=1, delay=10151 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 46589:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.28.12@o2ib6: deadline 6:5s ago req@ffff8f15c255a850 x1638887788815200/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564231500 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 46588:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f15c255a850 x1638887788815200/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564231500 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20500:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f06bd7bc050 x1631305947540960/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564231524 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 46589:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 34 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 46588:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20500:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 15 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 13960:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0aede29050 x1636579209299472/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564231524 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 2 seconds Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 46589:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f15c255a850 x1638887788815200/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564231500 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 13960:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 46589:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 5c9f5376-a105-7e2f-1c52-759657f6fd7d (at 10.9.101.59@o2ib4), client will retry: rc -107 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.105@o2ib7 (13): c: 0, oc: 0, rc: 8 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 11s req@ffff8f2000391800 x1636451771077248/t0(0) o103->5580c86e-93fc-ec0b-7809-c452eedb4044@10.9.106.23@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 27 05:45:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.105@o2ib7: 3 seconds Jul 27 05:45:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 8 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0789e7ae00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0789e78800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0f20cdb400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f24eddbee00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 23097:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff8f06bd7ba850 x1638952025892624/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564231500 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 23097:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1a08c4d800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f20dcde3c00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44aa28b600 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0f20cde000 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f41595b3e00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3fec98b400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12d89b6c00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2b61390c00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f350e294200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f394bcb8600 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0789e7cc00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34d3348800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f28d1a6bc00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f076c0e3c00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0e1ae67400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0e1ae60200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2a6cd83c00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3c61631a00 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f33d5ac7200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12d89b6600 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f24eddbd200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0771259200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3c61630600 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f441bfe8200 Jul 27 05:45:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.212@o2ib7: accepting Jul 27 05:45:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 2 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44bc67f800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f350e293400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f350e297800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f350e293200 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20dcde2800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34d334f400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3d73326400 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f350e292200 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f394e0e4800 Jul 27 05:45:05 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2636ce1e00 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 21679:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.8.3@o2ib6 from 10.0.10.51@o2ib7 Jul 27 05:45:05 fir-md1-s1 kernel: LNetError: 21679:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 7 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=13 reqQ=0 recA=23, svcEst=11, delay=10151 Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 9 previous similar messages Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0d63ce4050 x1638085954658560/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564231500 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:45:05 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 42 previous similar messages Jul 27 05:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8a37f7b1-3efc-30e9-f8d1-739df6680357 (at 10.9.104.19@o2ib4), client will retry: rc = -110 Jul 27 05:45:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 05:45:08 fir-md1-s1 kernel: Lustre: 65760:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e08ff9c50 x1639509333416272/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564231513 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:09 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3c9a6e5c50 x1636579209298368/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:1/0 lens 488/440 e 0 to 0 dl 1564231531 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 05:45:09 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+10s req@ffff8f3020bad050 x1640014159457840/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564231499 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:09 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 15 previous similar messages Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: 24567:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:10s); client may timeout. req@ffff8f3020bad050 x1640014159457840/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:29/0 lens 488/440 e 0 to 0 dl 1564231499 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: 24567:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 39 previous similar messages Jul 27 05:45:09 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 15 previous similar messages Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: fir-OST000d-osc-MDT0002: Connection to fir-OST000d (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:09 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 05:45:10 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:10s); client may timeout. req@ffff8f06bd7bf050 x1638085954658032/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564231500 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:45:10 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 27 05:45:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 92b76833-e0a6-d520-474e-2227f356d2b3 (at 10.9.109.61@o2ib4), client will retry: rc = -110 Jul 27 05:45:12 fir-md1-s1 kernel: Lustre: 31002:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:12s); client may timeout. req@ffff8f2c5c72d050 x1636451771077344/t0(0) o103->5580c86e-93fc-ec0b-7809-c452eedb4044@10.9.106.23@o2ib4:0/0 lens 328/192 e 0 to 0 dl 1564231500 ref 1 fl Complete:H/0/0 rc 0/0 Jul 27 05:45:12 fir-md1-s1 kernel: Lustre: 31002:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 05:45:13 fir-md1-s1 kernel: LustreError: 21685:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f3e08ff9c50 x1639509333416272/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564231513 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:13 fir-md1-s1 kernel: LustreError: 21685:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 9 previous similar messages Jul 27 05:45:18 fir-md1-s1 kernel: LustreError: 20510:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+4s req@ffff8f3e08ffc850 x1638876938636112/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564231514 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:18 fir-md1-s1 kernel: LustreError: 20510:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 27 05:45:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c4566649-5001-d956-15cb-934d725d7f29 (at 10.9.113.11@o2ib4), client will retry: rc -110 Jul 27 05:45:18 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 05:45:18 fir-md1-s1 kernel: Lustre: 20510:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f3e08ffc850 x1638876938636112/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564231514 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:45:18 fir-md1-s1 kernel: Lustre: 20510:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3e08ffb450 x1638935437586640/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564231523 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:18 fir-md1-s1 kernel: Lustre: 20510:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 27 05:45:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.106.23@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3fe8e0e780/0x5d9ee68b256ad201 lrc: 3/0,0 mode: EX/EX res: [0x2c002c6c0:0x5472:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.9.106.23@o2ib4 remote: 0xd42c8c1f1687c5f5 expref: 5120 pid: 23636 timeout: 3346583 lvb_type: 3 Jul 27 05:45:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 27 05:45:23 fir-md1-s1 kernel: LustreError: 23636:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f451754e800 ns: mdt-fir-MDT0002_UUID lock: ffff8f3fe8e09440/0x5d9ee68b256adf1a lrc: 3/0,0 mode: EX/EX res: [0x2c002c6c0:0x546c:0x0].0x0 bits 0x8/0x0 rrc: 4 type: IBT flags: 0x50000000000000 nid: 10.9.106.23@o2ib4 remote: 0xd42c8c1f1687c62d expref: 5098 pid: 23636 timeout: 0 lvb_type: 3 Jul 27 05:45:23 fir-md1-s1 kernel: LustreError: 23636:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Jul 27 05:45:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ace03b91-b18a-96ac-469e-d0915619acae (at 10.9.105.44@o2ib4), client will retry: rc = -110 Jul 27 05:45:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 05:45:25 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564231494/real 1564231505] req@ffff8f0c966e7800 x1636747412754592/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564231525 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 05:45:25 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 127 previous similar messages Jul 27 05:45:29 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+5s req@ffff8f3e08ffb850 x1638876938636880/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564231524 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:29 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 37 previous similar messages Jul 27 05:45:29 fir-md1-s1 kernel: Lustre: 56757:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:5s); client may timeout. req@ffff8f3e08ff9850 x1638876938636576/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564231524 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 05:45:29 fir-md1-s1 kernel: Lustre: 56757:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 05:45:29 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0002: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 05:45:30 fir-md1-s1 kernel: Lustre: 23734:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f734abc00 x1635343753069696/t0(0) o101->c14bf4c5-b9f6-d04f-2c8a-c85dd78efbd5@10.9.109.45@o2ib4:5/0 lens 1808/3288 e 0 to 0 dl 1564231535 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 05:45:30 fir-md1-s1 kernel: Lustre: 23734:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 55 previous similar messages Jul 27 05:45:34 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.109.14@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1fb91821c0/0x5d9ee68b222df8cd lrc: 3/0,0 mode: PR/PR res: [0x2000299e3:0x37e:0x0].0x0 bits 0x13/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.9.109.14@o2ib4 remote: 0x1580def359acbeb4 expref: 17 pid: 97644 timeout: 3346594 lvb_type: 0 Jul 27 05:45:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.1@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0e24d56e40/0x5d9ee68b256aecdb lrc: 3/0,0 mode: PW/PW res: [0x2c002c494:0xb7a:0x0].0x0 bits 0x40/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.9.103.1@o2ib4 remote: 0x97b87152cf9ffbb expref: 435 pid: 23646 timeout: 3346596 lvb_type: 0 Jul 27 05:47:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 05:47:44 fir-md1-s1 kernel: Lustre: Skipped 1485 previous similar messages Jul 27 05:51:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 05:51:49 fir-md1-s1 kernel: Lustre: Skipped 727 previous similar messages Jul 27 05:52:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 05:52:59 fir-md1-s1 kernel: Lustre: Skipped 2289 previous similar messages Jul 27 05:53:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 05:53:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 05:58:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 05:58:49 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 06:02:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 06:02:39 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 06:02:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 06:02:59 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 27 06:04:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 06:04:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 06:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 06:08:59 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 06:13:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 06:13:50 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 27 06:14:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 06:14:07 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 27 06:15:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 06:15:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 06:19:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 06:19:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 22990:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=16 reqQ=0 recA=25, svcEst=20, delay=11198 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 20504:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=14 reqQ=0 recA=35, svcEst=1, delay=11206 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 71853:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 12s req@ffff8f0edd19d700 x1631631701996432/t0(0) o35->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 71853:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 5 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 22990:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-6s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f38fe206c50 x1638906106764064/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:25/0 lens 488/0 e 0 to 0 dl 1564233625 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21539:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.114.6@o2ib4: deadline 6:6s ago req@ffff8f2d8e65d050 x1638824958172336/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:25/0 lens 488/0 e 0 to 0 dl 1564233625 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 22990:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46539:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f38fe206c50 x1638906106764064/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:25/0 lens 488/0 e 0 to 0 dl 1564233625 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21539:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46539:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 23739:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564233619/real 1564233619] req@ffff8f2addd69200 x1636747429413136/t0(0) o106->fir-MDT0002@10.9.108.2@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564233626 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 23739:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21534:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -6+6s req@ffff8f4015644050 x1639509426942864/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:25/0 lens 488/440 e 0 to 0 dl 1564233625 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21534:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 10 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 6 seconds Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 9 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 2, rc: 7 Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 9 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1803e5e400 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 6 seconds Jul 27 06:20:31 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 140 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 46534:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.3@o2ib6 from 10.0.10.51@o2ib7 Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 46534:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 11 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 25997:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2edd98dc00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 27583:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34d334ba00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 24570:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34c63da200 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21735:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34c63dc200 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 22432:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f16a2e9ba00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f351e03e600 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f350e296000 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0000: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 42894:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3fec98e200 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 25998:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2057dcf800 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 25634:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2a6cd81800 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 22649:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2edd98d200 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20506:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f38a939ee00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 27481:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34d334c800 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21537:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ddfbd6200 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 46532:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f24eddb9000 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 24564:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34c63de800 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 22181:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34f41be000 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 46511:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34f41bce00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 27602:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f24eddbac00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21451:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 21451:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.102.50@o2ib4 x1631628421716976 Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 06:20:31 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f41595b7a00 Jul 27 06:20:31 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f35baac2e00 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=13 reqQ=0 recA=17, svcEst=20, delay=11198 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 18 previous similar messages Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-6s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f330308ac50 x1638824958172528/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:25/0 lens 488/0 e 0 to 0 dl 1564233625 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 06:20:31 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 92 previous similar messages Jul 27 06:20:35 fir-md1-s1 kernel: LustreError: 22226:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+11s req@ffff8f3701288450 x1638250348838432/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564233624 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:35 fir-md1-s1 kernel: LustreError: 22226:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 30 previous similar messages Jul 27 06:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 27 06:20:35 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 27 06:20:35 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:11s); client may timeout. req@ffff8f3701288450 x1638250348838432/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564233624 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 06:20:35 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 99 previous similar messages Jul 27 06:20:38 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564233631/real 1564233631] req@ffff8f0e0439dd00 x1636747429413264/t0(0) o106->fir-MDT0002@10.9.108.2@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564233638 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 06:20:38 fir-md1-s1 kernel: Lustre: 23573:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 49 previous similar messages Jul 27 06:20:43 fir-md1-s1 kernel: Lustre: 25997:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f208ea4a850 x1638086054049792/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564233648 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:46 fir-md1-s1 kernel: Lustre: 25997:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f208ea4c450 x1638906106763920/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564233651 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:46 fir-md1-s1 kernel: Lustre: 25997:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 51 previous similar messages Jul 27 06:20:48 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 17+0s req@ffff8f38fe204c50 x1639235092077216/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564233648 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 27 06:20:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 06:20:48 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 27 06:20:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.15@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f3175e28900/0x5d9ee68b3f658599 lrc: 3/0,0 mode: PR/PR res: [0x2000298a3:0x30bf:0x0].0x0 bits 0x5b/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.9.101.15@o2ib4 remote: 0x1bcae83b3603cac9 expref: 116 pid: 23642 timeout: 3348708 lvb_type: 0 Jul 27 06:20:49 fir-md1-s1 kernel: LustreError: 97668:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f350dac4800 ns: mdt-fir-MDT0000_UUID lock: ffff8f21fde469c0/0x5d9ee68b41df0fe8 lrc: 1/0,0 mode: EX/EX res: [0x2000298a3:0x30bf:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.9.101.15@o2ib4 remote: 0x1bcae83b3603d883 expref: 4 pid: 97668 timeout: 0 lvb_type: 3 Jul 27 06:20:49 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:23s); client may timeout. req@ffff8f119ee6fb00 x1631682483992720/t434359292695(0) o101->bfaf32fd-a75c-1493-838b-c2682e1a6ae6@10.9.101.15@o2ib4:25/0 lens 376/1568 e 0 to 0 dl 1564233625 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 06:20:49 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 27 06:20:51 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f24d2116300 x1631664491822288/t0(0) o101->03f309ce-970e-e12f-7319-08fedff79d7c@10.9.101.16@o2ib4:26/0 lens 376/1600 e 1 to 0 dl 1564233656 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:51 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 06:20:56 fir-md1-s1 kernel: LustreError: 27581:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 18+7s req@ffff8f1b909b1c50 x1639235092078032/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:19/0 lens 488/440 e 0 to 0 dl 1564233649 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:20:56 fir-md1-s1 kernel: LustreError: 27581:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 46 previous similar messages Jul 27 06:20:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 27 06:20:56 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 06:20:59 fir-md1-s1 kernel: Lustre: 70067:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:11s); client may timeout. req@ffff8f1166516850 x1638875865142352/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564233648 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 06:20:59 fir-md1-s1 kernel: Lustre: 70067:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 06:21:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.37@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f10437be9c0/0x5d9ee68b3f71d9ea lrc: 3/0,0 mode: PR/PR res: [0x200029899:0x22b5:0x0].0x0 bits 0x5b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.101.37@o2ib4 remote: 0xd57e9d73f225b936 expref: 109 pid: 23593 timeout: 3348725 lvb_type: 0 Jul 27 06:21:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Jul 27 06:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 06:24:06 fir-md1-s1 kernel: Lustre: Skipped 1256 previous similar messages Jul 27 06:24:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 27 06:24:32 fir-md1-s1 kernel: Lustre: Skipped 338 previous similar messages Jul 27 06:26:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 06:29:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 06:29:20 fir-md1-s1 kernel: Lustre: Skipped 876 previous similar messages Jul 27 06:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d652b49f-b6b1-d653-3249-7a6feb84dd30 (at 10.9.115.3@o2ib4) Jul 27 06:34:18 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 27 06:35:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 06:35:51 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 27 06:39:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 06:39:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 06:39:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 06:39:46 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 06:44:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 06:44:27 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 27 06:47:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 27 06:47:50 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 06:48:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24aa3e9000, cur 1564235284 expire 1564235134 last 1564235057 Jul 27 06:48:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 06:50:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 06:50:00 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 27 06:50:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 06:50:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 06:54:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 06:54:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 27 06:55:50 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 06:55:50 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.211@o2ib7 (5): c: 5, oc: 0, rc: 6 Jul 27 06:55:51 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f14a569d450 x1638875925624256/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564235765 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:51 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 27 06:55:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97481f17-b98d-0828-17b9-32f14b205b6e (at 10.9.114.13@o2ib4), client will retry: rc -110 Jul 27 06:55:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 06:55:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 06:55:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Jul 27 06:55:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (7): c: 1, oc: 0, rc: 8 Jul 27 06:55:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Jul 27 06:55:53 fir-md1-s1 kernel: LustreError: 21617:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0c44f27050 x1638952192879792/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564235764 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:53 fir-md1-s1 kernel: LustreError: 21617:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 06:55:55 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 06:55:55 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Jul 27 06:55:55 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (10): c: 3, oc: 0, rc: 8 Jul 27 06:55:55 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Jul 27 06:55:55 fir-md1-s1 kernel: Lustre: 20212:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564235750/real 1564235755] req@ffff8f10566a7800 x1636747446435024/t0(0) o13->fir-OST002c-osc-MDT0002@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564235757 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 27 06:55:55 fir-md1-s1 kernel: Lustre: fir-OST0024-osc-MDT0002: Connection to fir-OST0024 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 06:55:55 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 06:55:55 fir-md1-s1 kernel: Lustre: 20212:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 27603:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=20 reqQ=0 recA=38, svcEst=20, delay=10811 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 27603:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 2 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: LNetError: 46517:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.3@o2ib6 from 10.0.10.51@o2ib7 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2ea2bd2a00 x1631631805000336/t434376894867(0) o35->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:21/0 lens 392/424 e 0 to 0 dl 1564235751 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 66901:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f37a5ed8c50 x1638250419074784/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564235764 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:57 fir-md1-s1 kernel: LNetError: 46517:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 17 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 20202:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564235750/real 1564235756] req@ffff8f346311a400 x1636747446434256/t0(0) o13->fir-OST0028-osc-MDT0000@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564235757 ref 2 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 66901:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 15 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f312fb4c200 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 46592:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.1.35@o2ib6: deadline 6:5s ago req@ffff8f1fb9e50050 x1637402641553456/t0(0) o3->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:21/0 lens 488/0 e 0 to 0 dl 1564235751 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: fir-OST0028-osc-MDT0000: Connection to fir-OST0028 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 46592:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 48 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 46592:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f1fb9e50050 x1637402641553456/t0(0) o3->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:21/0 lens 488/0 e 0 to 0 dl 1564235751 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -107 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1c4c039450 x1638250419074544/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564235750 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 22730:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2057dcec00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f312fb4a800 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f273f3f5000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f441bfed000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 46521:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34c63dda00 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 25086:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 11s req@ffff8f221935d700 x1635054618575936/t0(0) o103->b50adf7b-1eb0-aced-6fea-489b596a7b56@10.9.101.50@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 25086:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 66 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f4420bb7c00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 24567:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f4314ac2400 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2ec9b2b000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 22432:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f350e293c00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20506:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f350e290400 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff8f1e0769e450 x1639235189826768/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564235751 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 06:55:57 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34c63dba00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4314ac4000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e295000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29937e0000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2057dc9e00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2613196c00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e290000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e293400 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20731eda00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e297400 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e294600 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 46567:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f4430b64200 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 21040:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f077125d000 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f397da55a00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 21744:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f29937e0400 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eaf2abe00 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350e290000 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c8b77898-6bcc-54d7-a771-0cfafa351f86 (at 10.9.101.6@o2ib4), client will retry: rc = -110 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 10305:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbe2c229dad0 vs. last_xid 5cbe2c229dadf req@ffff8f06ce237200 x1631549664123600/t0(0) o101->a0307bc6-c839-435b-9342-1c622269d753@10.9.105.34@o2ib4:2/0 lens 1768/0 e 0 to 0 dl 1564235762 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 27 06:55:57 fir-md1-s1 kernel: LustreError: 21038:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f350e293800 Jul 27 06:55:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.52@o2ib7: connected Jul 27 06:55:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 2 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 97600:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=27 reqQ=0 recA=29, svcEst=12, delay=10815 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 97600:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 97600:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-6s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2826d00850 x1639155672648992/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564235751 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: 97600:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c4a74d2b-de98-9a37-7ebb-5f19657dadd1 (at 10.9.108.2@o2ib4), client will retry: rc = -110 Jul 27 06:55:57 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 27 06:56:00 fir-md1-s1 kernel: LustreError: 46564:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+10s req@ffff8f24b4b9f450 x1631580873422896/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:20/0 lens 488/440 e 0 to 0 dl 1564235750 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1800347-72ce-eadd-608d-51a435000390 (at 10.9.112.15@o2ib4), client will retry: rc -110 Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: 27482:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:10s); client may timeout. req@ffff8f24b4b9e050 x1638886053164960/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564235750 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: 27482:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 48 previous similar messages Jul 27 06:56:00 fir-md1-s1 kernel: LustreError: 46564:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: 10305:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f14fb4bb600 x1631540506142560/t0(0) o101->0c135d2b-4dc8-6941-1c24-78d07ee5211f@10.9.103.36@o2ib4:5/0 lens 480/568 e 1 to 0 dl 1564235765 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:00 fir-md1-s1 kernel: Lustre: 10305:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 27 06:56:03 fir-md1-s1 kernel: LustreError: 21686:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f39aa1fe050 x1631628451819488/t0(0) o4->c4a74d2b-de98-9a37-7ebb-5f19657dadd1@10.9.108.2@o2ib4:16/0 lens 488/448 e 1 to 0 dl 1564235776 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 06:56:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c4a74d2b-de98-9a37-7ebb-5f19657dadd1 (at 10.9.108.2@o2ib4), client will retry: rc = -110 Jul 27 06:56:03 fir-md1-s1 kernel: LustreError: 21686:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 55 previous similar messages Jul 27 06:56:04 fir-md1-s1 kernel: Lustre: 23717:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f42992e1b00 x1638853131753536/t0(0) o101->067c479d-9c5c-ba9a-1825-5f3ac7b0af53@10.9.103.23@o2ib4:9/0 lens 480/568 e 1 to 0 dl 1564235769 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c8b77898-6bcc-54d7-a771-0cfafa351f86 (at 10.9.101.6@o2ib4), client will retry: rc = -110 Jul 27 06:56:07 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 06:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 27 06:56:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 06:56:09 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f19fffe2850 x1638887880334944/t0(0) o3->11f7dba6-7171-5836-2062-1974c5637c6a@10.8.28.11@o2ib6:14/0 lens 488/440 e 0 to 0 dl 1564235774 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:14 fir-md1-s1 kernel: LustreError: 20506:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 18+0s req@ffff8f2d7bed7c50 x1631588989454896/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:14/0 lens 488/440 e 0 to 0 dl 1564235774 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:14 fir-md1-s1 kernel: LustreError: 20506:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 06:56:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b1560181-32d0-3000-87fb-1969e5df2f5e (at 10.9.101.68@o2ib4), client will retry: rc = -110 Jul 27 06:56:17 fir-md1-s1 kernel: Lustre: 23717:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3b7cf2c500 x1631654825922176/t0(0) o101->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1564235782 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:17 fir-md1-s1 kernel: Lustre: 23717:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Jul 27 06:56:24 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+10s req@ffff8f19fffe2850 x1638887880334944/t0(0) o3->11f7dba6-7171-5836-2062-1974c5637c6a@10.8.28.11@o2ib6:14/0 lens 488/440 e 0 to 0 dl 1564235774 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:24 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 9 previous similar messages Jul 27 06:56:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 11f7dba6-7171-5836-2062-1974c5637c6a (at 10.8.28.11@o2ib6), client will retry: rc -110 Jul 27 06:56:24 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 06:56:24 fir-md1-s1 kernel: Lustre: 69438:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:10s); client may timeout. req@ffff8f19fffe2850 x1638887880334944/t0(0) o3->11f7dba6-7171-5836-2062-1974c5637c6a@10.8.28.11@o2ib6:14/0 lens 488/440 e 0 to 0 dl 1564235774 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 06:56:24 fir-md1-s1 kernel: Lustre: 69438:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 06:56:26 fir-md1-s1 kernel: Lustre: 23580:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10dc3dd100 x1638894897873088/t0(0) o101->70f17c05-8e9e-e3e3-0fb3-adadf2c8b10a@10.9.103.22@o2ib4:1/0 lens 480/568 e 1 to 0 dl 1564235791 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 06:56:26 fir-md1-s1 kernel: Lustre: 23580:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 27 06:56:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 34s: evicting client at 10.9.103.20@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f250f6d18c0/0x5d9ee68b5ea7f042 lrc: 3/0,0 mode: PW/PW res: [0x2c002c063:0x6b32:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.9.103.20@o2ib4 remote: 0x88e4309adc72ceba expref: 494 pid: 97650 timeout: 3350845 lvb_type: 0 Jul 27 06:56:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 27 06:56:31 fir-md1-s1 kernel: LustreError: 10308:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1506233800 ns: mdt-fir-MDT0002_UUID lock: ffff8f0ed60b6540/0x5d9ee68b5eac3418 lrc: 3/0,0 mode: PW/PW res: [0x2c002c68b:0x12fe4:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.108.11@o2ib4 remote: 0x59608db4afca9868 expref: 4085 pid: 10308 timeout: 0 lvb_type: 0 Jul 27 06:56:42 fir-md1-s1 kernel: Lustre: 20553:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:5s); client may timeout. req@ffff8f44b5b98300 x1638919030896672/t0(0) o101->f46dce57-e0f0-08b3-6c14-cb80f5f23489@10.9.103.13@o2ib4:7/0 lens 480/536 e 0 to 0 dl 1564235797 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 06:56:42 fir-md1-s1 kernel: Lustre: 20553:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 27 06:58:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 27 06:58:42 fir-md1-s1 kernel: Lustre: Skipped 573 previous similar messages Jul 27 07:00:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 07:00:09 fir-md1-s1 kernel: Lustre: Skipped 1247 previous similar messages Jul 27 07:00:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 07:00:54 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 07:04:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 07:04:30 fir-md1-s1 kernel: Lustre: Skipped 1913 previous similar messages Jul 27 07:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 07:08:52 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 27 07:10:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 07:10:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 07:14:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 07:14:31 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 27 07:15:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 07:19:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 07:19:00 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 27 07:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 07:21:44 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 07:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 07:24:56 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 27 07:29:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 07:29:11 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 27 07:31:22 fir-md1-s1 kernel: Lustre: 23646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564237875/real 1564237875] req@ffff8f370ffab900 x1636747463299360/t0(0) o106->fir-MDT0002@10.9.108.10@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564237882 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 07:31:22 fir-md1-s1 kernel: Lustre: 23646:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 55 previous similar messages Jul 27 07:31:22 fir-md1-s1 kernel: LustreError: 68193:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f44f0faa050 x1638899198176880/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564237905 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:22 fir-md1-s1 kernel: LustreError: 68193:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 26 previous similar messages Jul 27 07:31:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4), client will retry: rc -110 Jul 27 07:31:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 07:31:23 fir-md1-s1 kernel: Lustre: 10195:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564237876/real 1564237876] req@ffff8f0cfaf7b600 x1636747463301568/t0(0) o104->fir-MDT0002@10.9.103.27@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564237883 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 07:31:23 fir-md1-s1 kernel: Lustre: 10195:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 27 07:31:26 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 07:31:26 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 6 previous similar messages Jul 27 07:31:26 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (5): c: 5, oc: 0, rc: 8 Jul 27 07:31:26 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 6 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 27061:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=28 reqQ=0 recA=29, svcEst=1, delay=10450 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3124abd850 x1638906221731984/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564237905 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1f78dcbc50 x1638086220094720/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564237905 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 23 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 23625:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564237875/real 0] req@ffff8f14b6166000 x1636747463301296/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564237882 ref 3 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f245c1a1800 x1638870719434960/t434395925832(0) o101->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:21/0 lens 1768/1192 e 0 to 0 dl 1564237881 ref 2 fl Complete:/0/0 rc 0/0 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: LNetError: 27602:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.4@o2ib6 from 10.0.10.51@o2ib7 Jul 27 07:31:27 fir-md1-s1 kernel: LNetError: 27602:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 16 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 27602:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f20dcde4800 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 27061:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-6s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1b9fb7b000 x1638906221732480/t0(0) o35->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:21/0 lens 392/456 e 0 to 0 dl 1564237881 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: 27061:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 26 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 07:31:27 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 13 previous similar messages Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 46568:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec9b2c000 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f350e294400 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34f41bf000 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 46512:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk READ req@ffff8f29b206b850 x1639235265070224/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564237905 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 27060:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d1d4ab429cb0 vs. last_xid 5d1d4ab429dcf req@ffff8f2a88a79500 x1638086220094640/t0(0) o35->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:3/0 lens 392/0 e 0 to 0 dl 1564237893 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Jul 27 07:31:27 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f351e04ce00 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 3429bec6-fe2a-19ec-4f0c-bb576fed4ff4 (at 10.8.29.4@o2ib6), client will retry: rc = -110 Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 07:31:29 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3e780fb450 x1638875996544416/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564237917 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:29 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 21 previous similar messages Jul 27 07:31:30 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f33b57dfb00 x1638887654573472/t0(0) o101->9cb0b481-a543-cf79-4307-a21eb6ac928f@10.9.103.5@o2ib4:5/0 lens 480/568 e 1 to 0 dl 1564237895 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:30 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 27 07:31:33 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0993648450 x1636579297966352/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564237917 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 07:31:33 fir-md1-s1 kernel: Lustre: 21379:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f347686ef00 x1640050600861040/t0(0) o101->bbaa1906-af49-bd8d-7e3e-fd864792512f@10.9.103.32@o2ib4:8/0 lens 480/568 e 1 to 0 dl 1564237898 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:33 fir-md1-s1 kernel: Lustre: 21379:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 07:31:33 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 27 previous similar messages Jul 27 07:31:34 fir-md1-s1 kernel: Lustre: 50584:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564237887/real 1564237887] req@ffff8f30d6196f00 x1636747463305712/t0(0) o104->fir-MDT0002@10.9.101.26@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564237894 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 07:31:34 fir-md1-s1 kernel: Lustre: 50584:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Jul 27 07:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 27 07:31:38 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 27 07:31:41 fir-md1-s1 kernel: Lustre: 10559:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f399e2a8f00 x1638872020945616/t0(0) o101->8121f333-c515-acdd-73eb-da654528e9bc@10.9.103.25@o2ib4:16/0 lens 480/568 e 1 to 0 dl 1564237906 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8121f333-c515-acdd-73eb-da654528e9bc (at 10.9.103.25@o2ib4) reconnecting Jul 27 07:31:47 fir-md1-s1 kernel: Lustre: Skipped 525 previous similar messages Jul 27 07:31:50 fir-md1-s1 kernel: Lustre: 21741:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3caa79d850 x1631571083167296/t0(0) o4->6c1d7a0f-3fbd-e272-bbab-46b21ff978f8@10.9.102.69@o2ib4:25/0 lens 520/456 e 0 to 0 dl 1564237915 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:50 fir-md1-s1 kernel: Lustre: 21741:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 27 07:31:52 fir-md1-s1 kernel: LustreError: 46530:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2375f9d050 x1638086220100288/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564237917 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:52 fir-md1-s1 kernel: LustreError: 46530:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 17 previous similar messages Jul 27 07:31:52 fir-md1-s1 kernel: Lustre: 21128:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (27:1s); client may timeout. req@ffff8f390114c800 x1638966185100288/t0(0) o101->6a159b93-cbcb-a910-1e2c-6484b2bca678@10.9.103.18@o2ib4:21/0 lens 480/536 e 1 to 0 dl 1564237911 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 07:31:52 fir-md1-s1 kernel: Lustre: 21128:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 24 previous similar messages Jul 27 07:31:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 92b76833-e0a6-d520-474e-2227f356d2b3 (at 10.9.109.61@o2ib4), client will retry: rc = -110 Jul 27 07:31:55 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 07:31:55 fir-md1-s1 kernel: LustreError: 22433:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f3caa79d850 x1631571083167296/t0(0) o4->6c1d7a0f-3fbd-e272-bbab-46b21ff978f8@10.9.102.69@o2ib4:25/0 lens 520/456 e 0 to 0 dl 1564237915 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 07:31:55 fir-md1-s1 kernel: LustreError: 22433:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 07:31:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.26@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f33e2beba80/0x5d9ee68b7d9ee839 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6af:0xda:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.26@o2ib4 remote: 0xca3655998a8869c4 expref: 456 pid: 23759 timeout: 3352976 lvb_type: 0 Jul 27 07:31:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 27 07:31:57 fir-md1-s1 kernel: LustreError: 23740:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f3f588ff800 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f505caac0/0x5d9ee68b7da232ee lrc: 3/0,0 mode: PW/PW res: [0x2c002c6c9:0x13:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x50200400000020 nid: 10.9.103.26@o2ib4 remote: 0xca3655998a8869e7 expref: 3 pid: 23740 timeout: 0 lvb_type: 0 Jul 27 07:31:57 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:20s); client may timeout. req@ffff8f2d04797200 x1638871641151360/t0(0) o101->357ed5e6-797d-063b-772c-730368f05495@10.9.103.26@o2ib4:6/0 lens 480/536 e 1 to 0 dl 1564237896 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 07:32:06 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 39s: evicting client at 10.9.103.30@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0c3105d100/0x5d9ee68b7da23008 lrc: 3/0,0 mode: PW/PW res: [0x2c002c063:0x6b32:0x0].0x0 bits 0x40/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.9.103.30@o2ib4 remote: 0x898cabf53c6531a5 expref: 553 pid: 10195 timeout: 3352976 lvb_type: 0 Jul 27 07:32:06 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:4s); client may timeout. req@ffff8f26ab4bd100 x1634161939579696/t0(0) o101->32315fe6-6915-bd82-691a-5460d13ab6db@10.9.103.27@o2ib4:2/0 lens 480/536 e 0 to 0 dl 1564237922 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 07:32:06 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 07:32:06 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 27 07:32:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.13@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1681341b00/0x5d9ee68b7da255b6 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6bd:0xd8:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.13@o2ib4 remote: 0x2718299e5b5e4356 expref: 605 pid: 24580 timeout: 3352987 lvb_type: 0 Jul 27 07:32:07 fir-md1-s1 kernel: LustreError: 22004:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f270e058c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f1b9594ad00/0x5d9ee68b7da25a1d lrc: 3/0,0 mode: PW/PW res: [0x2c002c6c9:0x13:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x50200400000020 nid: 10.9.103.13@o2ib4 remote: 0x2718299e5b5e435d expref: 3 pid: 22004 timeout: 0 lvb_type: 0 Jul 27 07:32:17 fir-md1-s1 kernel: Lustre: 23638:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-15), not sending early reply req@ffff8f36b2b36c00 x1631654826531632/t0(0) o101->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:22/0 lens 480/568 e 0 to 0 dl 1564237942 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 07:32:17 fir-md1-s1 kernel: Lustre: 23638:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 27 07:33:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 07:33:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 07:33:12 fir-md1-s1 kernel: LustreError: 23677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564237902, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f438ec74800/0x5d9ee68b7dcd6de7 lrc: 3/0,1 mode: --/PW res: [0x2c002c6c9:0x13:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23677 timeout: 0 lvb_type: 0 Jul 27 07:33:26 fir-md1-s1 kernel: LustreError: 23735:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564237916, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3c0116b840/0x5d9ee68b7e00fb80 lrc: 3/0,1 mode: --/PW res: [0x2c002c6c9:0x13:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23735 timeout: 0 lvb_type: 0 Jul 27 07:33:37 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564237927, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f22b58ac5c0/0x5d9ee68b7e27ae94 lrc: 3/0,1 mode: --/PW res: [0x2c002c6c9:0x13:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24580 timeout: 0 lvb_type: 0 Jul 27 07:33:37 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 27 07:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3b25e774-b2f1-cad2-2b7c-1db6845ecc3a (at 10.9.103.15@o2ib4) Jul 27 07:34:58 fir-md1-s1 kernel: Lustre: Skipped 917 previous similar messages Jul 27 07:35:02 fir-md1-s1 kernel: LNet: Service thread pid 23677 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 07:35:02 fir-md1-s1 kernel: Pid: 23677, comm: mdt03_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 07:35:02 fir-md1-s1 kernel: Call Trace: Jul 27 07:35:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 07:35:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 07:35:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 07:35:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 07:35:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 07:35:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564238102.23677 Jul 27 07:35:17 fir-md1-s1 kernel: LNet: Service thread pid 23735 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 07:35:17 fir-md1-s1 kernel: Pid: 23735, comm: mdt03_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 07:35:17 fir-md1-s1 kernel: Call Trace: Jul 27 07:35:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 07:35:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 07:35:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 07:35:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 07:35:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 07:35:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564238117.23735 Jul 27 07:35:28 fir-md1-s1 kernel: LNet: Service thread pid 20724 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 07:35:28 fir-md1-s1 kernel: Pid: 20724, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 07:35:28 fir-md1-s1 kernel: Call Trace: Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 07:35:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 07:35:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 07:35:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564238128.20724 Jul 27 07:35:28 fir-md1-s1 kernel: Pid: 24580, comm: mdt01_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 07:35:28 fir-md1-s1 kernel: Call Trace: Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 07:35:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 07:35:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 07:35:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 07:40:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 07:40:22 fir-md1-s1 kernel: Lustre: Skipped 272 previous similar messages Jul 27 07:41:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 07:41:56 fir-md1-s1 kernel: Lustre: Skipped 157 previous similar messages Jul 27 07:45:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6b80f591-7f85-840f-613a-8029966db0a5 (at 10.9.103.36@o2ib4) Jul 27 07:45:03 fir-md1-s1 kernel: Lustre: Skipped 138 previous similar messages Jul 27 07:46:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 07:46:48 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 27 07:51:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 07:51:37 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 07:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 07:51:58 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 27 07:55:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3b25e774-b2f1-cad2-2b7c-1db6845ecc3a (at 10.9.103.15@o2ib4) Jul 27 07:55:08 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Jul 27 07:56:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 07:56:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 08:01:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 08:01:43 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 27 08:02:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0c135d2b-4dc8-6941-1c24-78d07ee5211f (at 10.9.103.36@o2ib4) reconnecting Jul 27 08:02:06 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 27 08:02:17 fir-md1-s1 kernel: LNet: Service thread pid 23677 completed after 1835.73s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=2, svcEst=1, delay=9103 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 21452:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 2 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 46556:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f299bb0e450 x1638886133318352/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:15/0 lens 488/0 e 0 to 0 dl 1564239945 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 46556:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 30 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 21498:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.6@o2ib4: deadline 6:4s ago req@ffff8f350c7c2450 x1638831197473504/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:15/0 lens 488/0 e 0 to 0 dl 1564239945 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 21498:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 21792:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff8f2f4e2f9450 x1640014392784384/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:15/0 lens 488/0 e 0 to 0 dl 1564239945 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 21792:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 55487:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff8f41d454c850 x1638881373488688/t0(0) o400->81e8ff24-44bf-7701-fc96-67a6d1b14698@10.9.104.14@o2ib4:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 21380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564239939/real 1564239939] req@ffff8f2ddd29b600 x1636747480950896/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564239946 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 82ccbbe6-4250-f7c3-e39a-58b58ca31763 (at 10.0.10.106@o2ib7) Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: Skipped 126 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 46519:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 46519:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.103.14@o2ib4 x1639323821545136 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 21684:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff8f38cdf4a850 x1638250521938784/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564239944 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:05:50 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 08:05:50 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 20 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f14402a9c00 Jul 27 08:05:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 8 seconds Jul 27 08:05:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f263d6c2c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f249e1cce00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0bd6396c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1dd1b48800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f41595b1e00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4420bb1800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2057dcda00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1925f93c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0bd6394200 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a08c48a00 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b765dee00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0bd6396600 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3980232800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f394bcbde00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0bd6396800 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: fir-OST0003-osc-MDT0002: Connection to fir-OST0003 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980237e00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1a08c4e800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2057dccc00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dd1b49800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2057dcf400 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f263d6c2200 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dd1b4a600 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4420bb6400 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0743736c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2844708600 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1925f93c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f353caac000 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0f20cdbe00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4420bb6200 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f263d6c1000 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f12d89b5c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0bd6395800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f249e1cf000 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f263d6c3000 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f44e7c37000 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f440ab49600 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f440ab4a200 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2057dcaa00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3b8d291800 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f44e7c35200 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2057dc8c00 Jul 27 08:05:50 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f14402aa600 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 23609:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=2, svcEst=1, delay=9105 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 23609:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 12 previous similar messages Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 16186:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06e9b88c50 x1638935638639296/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564239955 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:05:50 fir-md1-s1 kernel: Lustre: 16186:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 08:05:56 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564239949/real 1564239949] req@ffff8f0cd15bc800 x1636747480950880/t0(0) o104->fir-MDT0002@10.9.103.36@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564239956 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 08:05:56 fir-md1-s1 kernel: Lustre: 21380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564239949/real 1564239949] req@ffff8f2ddd29b600 x1636747480950896/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564239956 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 08:05:56 fir-md1-s1 kernel: Lustre: 21380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Jul 27 08:05:57 fir-md1-s1 kernel: Lustre: 20729:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564239949/real 1564239949] req@ffff8f1e1c180900 x1636747480950704/t0(0) o104->fir-MDT0002@10.9.103.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564239956 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 08:05:57 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:13s); client may timeout. req@ffff8f1f332e8f00 x1638872297433920/t0(0) o101->191e7928-23a0-eccc-c908-3ef7952d34e9@10.9.103.35@o2ib4:14/0 lens 480/536 e 0 to 0 dl 1564239944 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 08:05:57 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 79 previous similar messages Jul 27 08:05:58 fir-md1-s1 kernel: LustreError: 21289:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 9+0s req@ffff8f07e6d13c50 x1638086294858080/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564239958 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:05:58 fir-md1-s1 kernel: LustreError: 21289:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 52 previous similar messages Jul 27 08:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 27 08:05:58 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 27 08:06:03 fir-md1-s1 kernel: Lustre: 21496:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2c4b7a5450 x1638877150321216/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564239968 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:03 fir-md1-s1 kernel: Lustre: 21496:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 08:06:07 fir-md1-s1 kernel: LustreError: 20499:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+9s req@ffff8f0be602d850 x1637106828985120/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564239958 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:07 fir-md1-s1 kernel: LustreError: 24069:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+9s req@ffff8f0be602a050 x1639155816829552/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564239958 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 27 08:06:07 fir-md1-s1 kernel: Lustre: 24069:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f0be602a050 x1639155816829552/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564239958 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 08:06:11 fir-md1-s1 kernel: Lustre: 10307:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ef3ee1200 x1638872021296896/t0(0) o101->8121f333-c515-acdd-73eb-da654528e9bc@10.9.103.25@o2ib4:16/0 lens 480/568 e 1 to 0 dl 1564239976 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:11 fir-md1-s1 kernel: Lustre: 10307:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 46 previous similar messages Jul 27 08:06:17 fir-md1-s1 kernel: LustreError: 14789:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+9s req@ffff8f0be602c050 x1639235324408912/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564239968 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:17 fir-md1-s1 kernel: LustreError: 14789:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 35 previous similar messages Jul 27 08:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 27 08:06:17 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 08:06:17 fir-md1-s1 kernel: Lustre: 14789:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:9s); client may timeout. req@ffff8f0be602c050 x1639235324408912/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564239968 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 08:06:17 fir-md1-s1 kernel: Lustre: 14789:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 08:06:19 fir-md1-s1 kernel: LustreError: 21484:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f06e9b88c50 x1638935638639296/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564239955 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:19 fir-md1-s1 kernel: LustreError: 21484:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 29 previous similar messages Jul 27 08:06:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.33@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1af99b9f80/0x5d9ee68b99905961 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6ca:0x18:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.9.103.33@o2ib4 remote: 0x67df4151d22fd1e9 expref: 1304 pid: 26258 timeout: 3355042 lvb_type: 0 Jul 27 08:06:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.23@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1a674269c0/0x5d9ee68b997d5b4e lrc: 3/0,0 mode: PW/PW res: [0x2c002c6ae:0xd4:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.9.103.23@o2ib4 remote: 0xcb8e074402805735 expref: 524 pid: 22281 timeout: 3355044 lvb_type: 0 Jul 27 08:06:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 27 08:06:28 fir-md1-s1 kernel: Lustre: 20461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e08ff9200 x1631797413567552/t0(0) o101->1ecbe639-5eea-5e69-de1c-c50e9b9738eb@10.8.8.20@o2ib6:3/0 lens 576/3264 e 1 to 0 dl 1564239993 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:06:28 fir-md1-s1 kernel: Lustre: 20461:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 27 08:06:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2120770480/0x5d9ee68b99b5797e lrc: 3/0,0 mode: PW/PW res: [0x2c002c6c7:0x1b:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.35@o2ib4 remote: 0xd35d4f400bce6660 expref: 841 pid: 20729 timeout: 3355046 lvb_type: 0 Jul 27 08:06:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.102.44@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f118c03e300/0x5d9ee68b9884f119 lrc: 3/0,0 mode: PR/PR res: [0x2c002bdde:0xc00c:0x0].0x0 bits 0x13/0x0 rrc: 433 type: IBT flags: 0x60200400000020 nid: 10.9.102.44@o2ib4 remote: 0x4048d24191b7b5ac expref: 6272 pid: 10505 timeout: 3355058 lvb_type: 0 Jul 27 08:06:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 27 08:06:40 fir-md1-s1 kernel: Lustre: 23674:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f38bd740c00 x1636442168984256/t0(0) o101->829e8e6e-3608-cb1f-779c-fe5437a6c742@10.9.102.33@o2ib4:9/0 lens 576/536 e 0 to 0 dl 1564239999 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 08:06:40 fir-md1-s1 kernel: Lustre: 23674:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 08:06:41 fir-md1-s1 kernel: LustreError: 23728:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f31c63e1800 x1636747481103296/t0(0) o104->fir-MDT0002@10.9.102.44@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 27 08:11:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 08:11:29 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 08:12:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 08:12:29 fir-md1-s1 kernel: Lustre: Skipped 1998 previous similar messages Jul 27 08:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 08:16:09 fir-md1-s1 kernel: Lustre: Skipped 3004 previous similar messages Jul 27 08:17:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 08:17:53 fir-md1-s1 kernel: Lustre: Skipped 991 previous similar messages Jul 27 08:22:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 08:22:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 08:23:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 08:23:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 08:26:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 08:26:10 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 08:28:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 08:28:04 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 08:33:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 08:33:05 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 08:34:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 08:34:42 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 08:36:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 08:36:14 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 27 08:36:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 08:36:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 5, oc: 0, rc: 8 Jul 27 08:37:00 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f39ab2b4850 x1638952431588032/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564241834 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:00 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 69438:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=19 reqQ=0 recA=30, svcEst=20, delay=9780 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 44040:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.15.8@o2ib6: deadline 6:4s ago req@ffff8f253ea4b450 x1633753960872496/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564241820 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 44040:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 69438:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1c9dcd7c50 x1633753960872608/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564241820 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 69438:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 82 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 44040:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff8f253ea4b450 x1633753960872496/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:0/0 lens 488/0 e 0 to 0 dl 1564241820 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f39ab2b5050 x1639235406147552/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564241844 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 22428:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f09e7f49450 x1639155880128688/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564241820 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 46592:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff8f1c9dcd2850 x1638935702332848/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1564241819 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 21005:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff8f26fdd81e00 x1631632019796400/t0(0) o35->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 21005:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 6 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 23621:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564241816/real 1564241816] req@ffff8f3d8c8c6600 x1636747498331248/t0(0) o104->fir-MDT0002@10.9.103.16@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564241823 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 18 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f394bcb9800 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f394bcbfe00 Jul 27 08:37:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 1 seconds Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3980233c00 Jul 27 08:37:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 21 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0704f11200 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f28ac2dd600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1d30eec600 Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (0): c: 0, oc: 0, rc: 2 Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2ec1e8b200 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2ec1e8d200 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0789e7e600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0789e78e00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3980236e00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0771258000 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f394bcbc600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44e7c36600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1635fba400 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3b3c3e7000 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f28ac2df400 Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 21039:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.29.4@o2ib6 from 10.0.10.51@o2ib7 Jul 27 08:37:05 fir-md1-s1 kernel: LNetError: 21039:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 16 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20507:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f28ac2df400 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 22650:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0cad2ff400 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 22973:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ddfbd0a00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 21294:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34686d2a00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f13e93afa00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 22181:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3980231c00 Jul 27 08:37:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.209@o2ib7: connected Jul 27 08:37:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 2 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2ec1e8ca00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1d30eedc00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2302a29c00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44e7c36400 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f394bcbfc00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44e7c31e00 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f19e696f800 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1635fba200 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b3c3e4600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3980231600 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f394bcba000 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f284470d800 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0e4cdea000 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=3, svcEst=1, delay=9080 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 9 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0f089b8f00 x1638887655757088/t0(0) o101->9cb0b481-a543-cf79-4307-a21eb6ac928f@10.9.103.5@o2ib4:0/0 lens 480/568 e 0 to 0 dl 1564241820 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 82 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564241818/real 1564241824] req@ffff8f35b9cbe000 x1636747498331552/t0(0) o41->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1564241825 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 20244:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564241818/real 1564241824] req@ffff8f3d8c8c1b00 x1636747498331616/t0(0) o41->fir-MDT0003-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1564241825 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: 20244:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0002: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 08:37:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 71851:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -6+6s req@ffff8f134d368f00 x1633662378901104/t0(0) o37->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:29/0 lens 448/440 e 0 to 0 dl 1564241819 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:05 fir-md1-s1 kernel: LustreError: 71851:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 28 previous similar messages Jul 27 08:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 08:37:07 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 27 08:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6a6c91bf-5994-6d2d-e34d-9ae740d430ac (at 10.9.107.29@o2ib4), client will retry: rc = -110 Jul 27 08:37:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 08:37:08 fir-md1-s1 kernel: Lustre: 20505:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2b12a11850 x1638886171495968/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564241833 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:08 fir-md1-s1 kernel: Lustre: 20505:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 08:37:09 fir-md1-s1 kernel: LustreError: 66902:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f37a7996050 x1639235406147760/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:4/0 lens 488/440 e 0 to 0 dl 1564241854 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 08:37:09 fir-md1-s1 kernel: LustreError: 21741:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f37a7996850 x1638870787709872/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:4/0 lens 488/440 e 0 to 0 dl 1564241854 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 08:37:09 fir-md1-s1 kernel: LustreError: 21741:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 13 previous similar messages Jul 27 08:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c4a74d2b-de98-9a37-7ebb-5f19657dadd1 (at 10.9.108.2@o2ib4), client will retry: rc = -110 Jul 27 08:37:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 08:37:13 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 9+0s req@ffff8f2b12a15450 x1637106857624256/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564241833 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1800347-72ce-eadd-608d-51a435000390 (at 10.9.112.15@o2ib4), client will retry: rc -110 Jul 27 08:37:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 08:37:13 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 27 08:37:14 fir-md1-s1 kernel: Lustre: 66902:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3fcdf6ac50 x1631542527913168/t0(0) o4->1ae7de3e-f83c-4930-305c-63330132f512@10.9.107.60@o2ib4:19/0 lens 488/448 e 1 to 0 dl 1564241839 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:14 fir-md1-s1 kernel: Lustre: 66902:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Jul 27 08:37:15 fir-md1-s1 kernel: Lustre: 46545:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f3e9f10b450 x1638250572822288/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564241833 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 08:37:15 fir-md1-s1 kernel: Lustre: 46545:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 79 previous similar messages Jul 27 08:37:17 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+4s req@ffff8f3997fb1850 x1631632019790624/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564241833 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:17 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 8 previous similar messages Jul 27 08:37:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 1ae7de3e-f83c-4930-305c-63330132f512 (at 10.9.107.60@o2ib4), client will retry: rc = -110 Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: 21369:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/2), not sending early reply req@ffff8f1348deb900 x1638966186039760/t0(0) o101->6a159b93-cbcb-a910-1e2c-6484b2bca678@10.9.103.18@o2ib4:28/0 lens 480/568 e 1 to 0 dl 1564241848 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: 21369:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 30 previous similar messages Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 08:37:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.4@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f44ea35d100/0x5d9ee68bb5796693 lrc: 3/0,0 mode: PW/PW res: [0x2c002c4f6:0xf2f:0x0].0x0 bits 0x40/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.9.103.4@o2ib4 remote: 0xb4abefc93eeab499 expref: 549 pid: 23621 timeout: 3356903 lvb_type: 0 Jul 27 08:37:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: 23749:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:23s); client may timeout. req@ffff8f2a4ad07800 x1638872021575296/t0(0) o101->8121f333-c515-acdd-73eb-da654528e9bc@10.9.103.25@o2ib4:0/0 lens 480/536 e 0 to 0 dl 1564241820 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 08:37:23 fir-md1-s1 kernel: Lustre: 23749:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Jul 27 08:37:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.103.15@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f401fed0d80/0x5d9ee68bb53e7bca lrc: 3/0,0 mode: PW/PW res: [0x2c002c6c7:0x1e:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.9.103.15@o2ib4 remote: 0xe8bbe65a622f36e7 expref: 555 pid: 23711 timeout: 3356904 lvb_type: 0 Jul 27 08:37:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 27 08:37:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 60a9f157-4802-e53d-dccf-19f0d690f2d1 (at 10.9.0.1@o2ib4), client will retry: rc = -110 Jul 27 08:37:25 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+1s req@ffff8f1c9dcd5450 x1640014444019424/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564241844 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:25 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 23 previous similar messages Jul 27 08:37:26 fir-md1-s1 kernel: LustreError: 46541:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f39ab2b1850 x1631589583088176/t0(0) o4->fac29dea-ab53-7d7a-c2b9-fb2a0ceb526e@10.9.102.53@o2ib4:25/0 lens 504/448 e 0 to 0 dl 1564241845 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 08:37:26 fir-md1-s1 kernel: LustreError: 46541:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 27 08:37:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 27 08:37:33 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 08:37:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 38s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f37b3b39b00/0x5d9ee68bb5179387 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6d1:0x24:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.103.35@o2ib4 remote: 0xd35d4f400bceebfa expref: 996 pid: 21436 timeout: 3356919 lvb_type: 0 Jul 27 08:37:48 fir-md1-s1 kernel: Lustre: 25680:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:48s); client may timeout. req@ffff8f289a4d3c00 x1638883623726400/t0(0) o101->3411ffac-482d-1535-c486-9206f14b07f9@10.9.103.6@o2ib4:0/0 lens 480/536 e 0 to 0 dl 1564241820 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 08:37:48 fir-md1-s1 kernel: Lustre: 25680:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 13 previous similar messages Jul 27 08:37:49 fir-md1-s1 kernel: LustreError: 23621:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f33b1233400 ns: mdt-fir-MDT0002_UUID lock: ffff8f0da9818000/0x5d9ee68bb5798993 lrc: 3/0,0 mode: PW/PW res: [0x2c002c6ce:0xd1:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x50200000000000 nid: 10.9.103.4@o2ib4 remote: 0xb4abefc93eeab4a7 expref: 2 pid: 23621 timeout: 0 lvb_type: 0 Jul 27 08:37:51 fir-md1-s1 kernel: LustreError: 23688:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f11bbd12400 ns: mdt-fir-MDT0002_UUID lock: ffff8f131aeb1200/0x5d9ee68bb5797676 lrc: 3/0,0 mode: PW/PW res: [0x2c002c4f6:0xf2f:0x0].0x0 bits 0x40/0x0 rrc: 17 type: IBT flags: 0x50200400000020 nid: 10.9.103.5@o2ib4 remote: 0xfd33651675b99d48 expref: 2 pid: 23688 timeout: 0 lvb_type: 0 Jul 27 08:38:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 08:38:07 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 27 08:38:13 fir-md1-s1 kernel: Lustre: 50445:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-44), not sending early reply req@ffff8f164121e000 x1631644098191600/t0(0) o101->558f43d5-0094-09f1-ffd1-c721e83928eb@10.9.103.8@o2ib4:18/0 lens 480/568 e 0 to 0 dl 1564241898 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 08:38:13 fir-md1-s1 kernel: Lustre: 50445:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 27 08:38:36 fir-md1-s1 kernel: LustreError: 23603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564241826, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3175fc0240/0x5d9ee68bb579bdcd lrc: 3/0,1 mode: --/PW res: [0x2c002c4f6:0xf2f:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23603 timeout: 0 lvb_type: 0 Jul 27 08:38:39 fir-md1-s1 kernel: LustreError: 97664:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564241829, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f16b5a8d580/0x5d9ee68bb57aad01 lrc: 3/0,1 mode: --/PW res: [0x2c002c4f6:0xf2f:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97664 timeout: 0 lvb_type: 0 Jul 27 08:39:21 fir-md1-s1 kernel: LustreError: 21434:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564241871, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f181fdb4380/0x5d9ee68bb5b6447e lrc: 3/0,1 mode: --/PW res: [0x2c002c4f6:0xf2f:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21434 timeout: 0 lvb_type: 0 Jul 27 08:40:27 fir-md1-s1 kernel: LNet: Service thread pid 23603 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 08:40:27 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 27 08:40:27 fir-md1-s1 kernel: Pid: 23603, comm: mdt02_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 08:40:27 fir-md1-s1 kernel: Call Trace: Jul 27 08:40:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 08:40:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 08:40:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 08:40:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 08:40:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 08:40:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564242027.23603 Jul 27 08:40:29 fir-md1-s1 kernel: LNet: Service thread pid 97664 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 08:40:29 fir-md1-s1 kernel: Pid: 97664, comm: mdt01_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 08:40:29 fir-md1-s1 kernel: Call Trace: Jul 27 08:40:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 08:40:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 08:40:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 08:40:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 08:40:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 08:40:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564242029.97664 Jul 27 08:41:11 fir-md1-s1 kernel: LNet: Service thread pid 21434 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 27 08:41:11 fir-md1-s1 kernel: Pid: 21434, comm: mdt01_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 27 08:41:11 fir-md1-s1 kernel: Call Trace: Jul 27 08:41:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 27 08:41:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 27 08:41:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 27 08:41:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 27 08:41:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 27 08:41:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564242071.21434 Jul 27 08:43:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ec8e478a-93b2-34d3-2772-2238a12dddbe (at 10.8.18.28@o2ib6) reconnecting Jul 27 08:43:08 fir-md1-s1 kernel: Lustre: Skipped 361 previous similar messages Jul 27 08:46:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 08:46:21 fir-md1-s1 kernel: Lustre: Skipped 474 previous similar messages Jul 27 08:46:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 08:46:35 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 27 08:49:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 27 08:49:53 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 27 08:53:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 314a852d-d223-3f57-2ae7-41d5f031741d (at 10.9.103.1@o2ib4) reconnecting Jul 27 08:53:13 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 27 08:56:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b0f95b8-bc58-6a77-0e21-a3225e91db7a (at 10.9.103.1@o2ib4) Jul 27 08:56:22 fir-md1-s1 kernel: Lustre: Skipped 140 previous similar messages Jul 27 08:59:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 08:59:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 09:00:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:00:43 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 27 09:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 314a852d-d223-3f57-2ae7-41d5f031741d (at 10.9.103.1@o2ib4) reconnecting Jul 27 09:03:22 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 27 09:06:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 09:06:27 fir-md1-s1 kernel: Lustre: Skipped 138 previous similar messages Jul 27 09:09:43 fir-md1-s1 kernel: LNet: Service thread pid 23603 completed after 1956.33s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 27 09:09:43 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Jul 27 09:10:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:10:45 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 09:11:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 09:11:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 09:11:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 09:11:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (6): c: 0, oc: 0, rc: 7 Jul 27 09:11:37 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 09:11:37 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 4 previous similar messages Jul 27 09:11:37 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (7): c: 5, oc: 0, rc: 8 Jul 27 09:11:37 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 4 previous similar messages Jul 27 09:11:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 0 seconds Jul 27 09:11:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 30 previous similar messages Jul 27 09:11:38 fir-md1-s1 kernel: Lustre: 20204:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564243891/real 0] req@ffff8f06dfb4fb00 x1636747516727968/t0(0) o41->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1564243898 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:11:38 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:11:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 09:11:39 fir-md1-s1 kernel: Lustre: 20202:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564243892/real 0] req@ffff8f10d9614800 x1636747516728528/t0(0) o13->fir-OST0027-osc-MDT0000@10.0.10.108@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564243899 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:11:39 fir-md1-s1 kernel: Lustre: fir-OST0027-osc-MDT0000: Connection to fir-OST0027 (at 10.0.10.108@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:11:39 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Jul 27 09:11:39 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.101@o2ib7 (9): c: 0, oc: 0, rc: 8 Jul 27 09:11:39 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 2 seconds Jul 27 09:11:39 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 1 previous similar message Jul 27 09:11:39 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f179aa7ee00 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 25634:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=33 reqQ=0 recA=44, svcEst=20, delay=8506 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 25634:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46592:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.10@o2ib4: deadline 6:3s ago req@ffff8f22b7da9c50 x1638906459661504/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:6/0 lens 488/0 e 0 to 0 dl 1564243896 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 22990:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3dc9b0c850 x1638906459661648/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:7/0 lens 488/0 e 0 to 0 dl 1564243897 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46592:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 16 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LNetError: 21737:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.9.113.5@o2ib4 from 10.0.10.51@o2ib7 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 46529:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f1ee17c9050 x1639509886405472/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:6/0 lens 488/0 e 0 to 0 dl 1564243896 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 22990:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LNetError: 21737:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 6 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 46529:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 22730:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1c31895400 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46589:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1d248e7200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 69438:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f448d29da00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 97599:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec5a2a000 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3b3c3e4200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 21291:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f15aebae200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f41595b0c00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0f20cdae00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0f20cdd800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46515:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ec5a2dc00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 22730:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f1ee17c9c50 x1638870839875552/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564243896 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 22730:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46576:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f16e6095800 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 2f627314-68e3-35d2-70d7-0cd2604dd048 (at 10.9.115.4@o2ib4), client will retry: rc -110 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46533:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1706328e00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 44044:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f41b27cdc00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 46514:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f28ac2d8600 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 21496:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0f20cdd600 Jul 27 09:11:40 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 09:11:40 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 8 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec5a2b600 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15aebac400 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ee6587000 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33efb28600 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e607e2a00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b3c3e3c00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d248e7200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0f20cdde00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ee6585600 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec5a2b200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d248e7800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f448d29f800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16e6097800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2057dcbc00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ee6584400 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28ac2dd000 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f446ffd1800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f446ffd5a00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0f20cd8200 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f179aa7c800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f22b7da9850 x1631581003394944/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564243896 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 20 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c597d7e00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15aebaf800 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f19845d0850 x1638086426025520/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564243920 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f287dbaa600 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f446ffd6c00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f446ffd5c00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38a939dc00 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f41595b7c00 Jul 27 09:11:40 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.210@o2ib7: connected Jul 27 09:11:40 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 1 previous similar message Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=0, svcEst=9, delay=7770 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 3 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f341e66d400 x1637986217512816/t0(0) o37->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:6/0 lens 448/408 e 0 to 0 dl 1564243896 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 09:11:40 fir-md1-s1 kernel: Lustre: 71867:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 81 previous similar messages Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 21833:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0c6dd91e00 x1636452981352224/t0(0) o37->f649fa1c-e7fe-d613-2a65-337c97d2e136@10.9.108.54@o2ib4:5/0 lens 448/440 e 0 to 0 dl 1564243895 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:40 fir-md1-s1 kernel: LustreError: 21833:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 2 previous similar messages Jul 27 09:11:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 712591ba-92f6-6e19-2523-c1aaf8221bbf (at 10.9.106.62@o2ib4), client will retry: rc = -110 Jul 27 09:11:41 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 09:11:45 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f062fc95100 x1631654827858000/t0(0) o101->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:20/0 lens 480/568 e 1 to 0 dl 1564243910 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:45 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 09:11:46 fir-md1-s1 kernel: Lustre: 10305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564243899/real 1564243899] req@ffff8f07c2e78900 x1636747516731696/t0(0) o104->fir-MDT0002@10.9.103.8@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564243906 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:11:46 fir-md1-s1 kernel: LustreError: 27604:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2462869450 x1631550771286560/t0(0) o4->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:29/0 lens 488/448 e 1 to 0 dl 1564243919 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 09:11:46 fir-md1-s1 kernel: LustreError: 27604:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 54 previous similar messages Jul 27 09:11:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 27 09:11:46 fir-md1-s1 kernel: Lustre: 10305:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 60 previous similar messages Jul 27 09:11:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 27 09:11:47 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 27 09:11:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c4a74d2b-de98-9a37-7ebb-5f19657dadd1 (at 10.9.108.2@o2ib4), client will retry: rc = -110 Jul 27 09:11:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 09:11:54 fir-md1-s1 kernel: Lustre: 46563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23cec4d850 x1638877273026320/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564243919 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:54 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1488e01050 x1638825141586864/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564243920 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:54 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 51 previous similar messages Jul 27 09:11:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7baec68-f8c8-0730-9508-ba1e77698953 (at 10.9.114.6@o2ib4), client will retry: rc -110 Jul 27 09:11:54 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 27 09:11:59 fir-md1-s1 kernel: LustreError: 44039:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f23cec4d850 x1638877273026320/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564243919 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:11:59 fir-md1-s1 kernel: LustreError: 44039:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 41 previous similar messages Jul 27 09:12:02 fir-md1-s1 kernel: Lustre: 21037:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f2ee58e6450 x1640014490291920/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:0/0 lens 488/440 e 0 to 0 dl 1564243920 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 09:12:02 fir-md1-s1 kernel: Lustre: 21037:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 90 previous similar messages Jul 27 09:12:04 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0a2589c450 x1635091079560512/t0(0) o4->731c6d4a-c0f1-3f82-e1ef-8266de117fd6@10.9.109.46@o2ib4:9/0 lens 488/448 e 0 to 0 dl 1564243929 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:12:04 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Jul 27 09:12:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with be35deb0-99fb-c9aa-273e-c640ee5c1974 (at 10.8.28.6@o2ib6), client will retry: rc = -110 Jul 27 09:12:06 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 09:12:08 fir-md1-s1 kernel: LustreError: 21684:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+8s req@ffff8f3dc9b0ec50 x1638876147777088/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564243920 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:12:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c4566649-5001-d956-15cb-934d725d7f29 (at 10.9.113.11@o2ib4), client will retry: rc -110 Jul 27 09:12:08 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 09:12:08 fir-md1-s1 kernel: LustreError: 21684:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 11 previous similar messages Jul 27 09:12:20 fir-md1-s1 kernel: Lustre: 23731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f386d70e600 x1638003628424880/t0(0) o101->7653d4a1-59ee-f80a-c329-01dbb9c49143@10.9.106.70@o2ib4:25/0 lens 576/3264 e 1 to 0 dl 1564243945 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:12:20 fir-md1-s1 kernel: Lustre: 23731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Jul 27 09:12:27 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 32s: evicting client at 10.9.102.39@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f19468b0240/0x5d9ee68bd3ee2947 lrc: 3/0,0 mode: PR/PR res: [0x2c002bdde:0xc00c:0x0].0x0 bits 0x13/0x0 rrc: 496 type: IBT flags: 0x60200400000020 nid: 10.9.102.39@o2ib4 remote: 0x7b11979dddf7650c expref: 1556 pid: 26255 timeout: 3359004 lvb_type: 0 Jul 27 09:12:27 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 27 09:12:28 fir-md1-s1 kernel: Lustre: 23681:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f31f6f40600 x1631935375661712/t0(0) o101->cf27766b-7a06-85c3-e1d8-3a06956d665b@10.8.20.11@o2ib6:27/0 lens 576/536 e 0 to 0 dl 1564243947 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 09:12:28 fir-md1-s1 kernel: Lustre: 23681:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 09:12:28 fir-md1-s1 kernel: LustreError: 21678:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f4181866600 x1636747517182928/t0(0) o104->fir-MDT0002@10.9.108.56@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 27 09:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 09:14:02 fir-md1-s1 kernel: Lustre: Skipped 709 previous similar messages Jul 27 09:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 09:16:28 fir-md1-s1 kernel: Lustre: Skipped 1077 previous similar messages Jul 27 09:21:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:21:00 fir-md1-s1 kernel: Lustre: Skipped 307 previous similar messages Jul 27 09:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 09:24:08 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 09:26:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 09:26:59 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 27 09:28:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 09:28:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 09:31:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:31:01 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 09:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 09:34:45 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 09:37:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 09:37:02 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 27 09:41:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:41:12 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 09:41:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 09:41:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 25633:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=39 reqQ=0 recA=39, svcEst=1, delay=5916 Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 7 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (6): c: 5, oc: 0, rc: 8 Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 7 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 20541:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564245716/real 1564245717] req@ffff8f0ce982c500 x1636747530353744/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564245723 ref 2 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f33e3991800 x1631306728381584/t0(0) o103->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 23580:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0b1b238900 x1637150940605520/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:2/0 lens 600/3264 e 0 to 0 dl 1564245722 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 23580:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 22973:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f323e594050 x1638825191110608/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:2/0 lens 488/408 e 0 to 0 dl 1564245722 ref 2 fl Complete:/0/0 rc 131072/131072 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f1e1d253050 x1631632190024144/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564245722 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 6549:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f31eabd2450 x1638876179268192/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564245723 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 6549:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 09:42:04 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds Jul 27 09:42:04 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3cd6637600 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1f4830a000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2e53f55c00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f135f380c00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f16e6092000 Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 22670:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.12@o2ib6 from 10.0.10.51@o2ib7 Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 26 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LNetError: 22670:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 23 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ab0909a00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 22670:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f312fb49000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 46511:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f287dbac800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 23106:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2a432ce600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 22975:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f312fb4f600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f312fb4ac00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 69437:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f36b23c0e00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 21535:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2e53f55a00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 6547:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f287dbad800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44f50ada00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44ffa4e800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f35baac0600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f14a39b2200 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2e53f55000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 44039:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f170632e600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 27580:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f248dad8c00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a432cf800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b765db800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4321ad9a00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2e53f50400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f312fb4a200 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ffa49600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f248dade200 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f09c4f0d000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4321adea00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f29ee837a00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0554405e00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f14a39b0600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 46565:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f237555f000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e53f51600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ab090b800 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ffa4f000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ffa4a400 Jul 27 09:42:04 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.107@o2ib7: connected Jul 27 09:42:04 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 3 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f41d4313c00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f41d4315400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3c597d2400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44bc67ac00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ffa4c000 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1d01175c50 x1638886265539568/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564245737 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44ffa4cc00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44cb21b200 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2e53f53400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f248daddc00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2e53f57a00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44cb21c400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44bc67a600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44cb21fc00 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44bc67b600 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44cb218400 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20995:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f0aa9d30600 x1633662388346000/t0(0) o37->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:2/0 lens 448/440 e 0 to 0 dl 1564245722 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:04 fir-md1-s1 kernel: LustreError: 20995:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 28 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564245717/real 1564245723] req@ffff8f39aef06c00 x1636747530353888/t0(0) o13->fir-OST0013-osc-MDT0000@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564245724 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564245717/real 1564245723] req@ffff8f3b592c5a00 x1636747530354224/t0(0) o13->fir-OST0005-osc-MDT0002@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564245724 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 44 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 45 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: fir-OST000c-osc-MDT0002: Connection to fir-OST000c (at 10.0.10.103@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0003-osp-MDT0000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=27 reqQ=0 recA=1, svcEst=7, delay=5916 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 3 previous similar messages Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f13742ab050 x1637106951496304/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564245723 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 09:42:04 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 27 09:42:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 11f7dba6-7171-5836-2062-1974c5637c6a (at 10.8.28.11@o2ib6), client will retry: rc -110 Jul 27 09:42:05 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 27 09:42:05 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f31eabd3050 x1634181553104000/t0(0) o4->92b76833-e0a6-d520-474e-2227f356d2b3@10.9.109.61@o2ib4:23/0 lens 488/448 e 1 to 0 dl 1564245743 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:05 fir-md1-s1 kernel: LustreError: 46515:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 25 previous similar messages Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: 20246:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564245719/real 1564245723] req@ffff8f3b592c6f00 x1636747530354352/t0(0) o13->fir-OST0020-osc-MDT0002@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564245726 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: fir-OST0022-osc-MDT0000: Connection to fir-OST0022 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: 20246:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages Jul 27 09:42:07 fir-md1-s1 kernel: LustreError: 20505:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+5s req@ffff8f31eabd0850 x1639509962383488/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564245722 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: 46522:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f2ea5798050 x1638825191110512/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564245722 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 09:42:07 fir-md1-s1 kernel: Lustre: 46522:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 26 previous similar messages Jul 27 09:42:07 fir-md1-s1 kernel: LustreError: 20505:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 09:42:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6), client will retry: rc = -110 Jul 27 09:42:09 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 09:42:09 fir-md1-s1 kernel: Lustre: fir-OST0023-osc-MDT0002: Connection to fir-OST0023 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 09:42:09 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 27 09:42:10 fir-md1-s1 kernel: LustreError: 46512:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f323e591050 x1638886265539296/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564245743 ref 1 fl Interpret:/2/0 rc 0/0 Jul 27 09:42:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 09:42:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 09:42:10 fir-md1-s1 kernel: LustreError: 46512:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Jul 27 09:42:11 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564245723/real 1564245723] req@ffff8f379aa3ec00 x1636747530355632/t0(0) o13->fir-OST001c-osc-MDT0002@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564245730 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 09:42:11 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Jul 27 09:42:12 fir-md1-s1 kernel: Lustre: 46557:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f31eabd5450 x1639509962383856/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564245737 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:12 fir-md1-s1 kernel: Lustre: 46557:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Jul 27 09:42:17 fir-md1-s1 kernel: LustreError: 56756:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f38dc7ec850 x1631632190024976/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564245737 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:17 fir-md1-s1 kernel: LustreError: 56756:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 27 09:42:18 fir-md1-s1 kernel: Lustre: 22975:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ea579bc50 x1638876179269152/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564245743 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:18 fir-md1-s1 kernel: Lustre: 22975:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 28 previous similar messages Jul 27 09:42:23 fir-md1-s1 kernel: LustreError: 21516:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1d01172850 x1638831333618944/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564245743 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:42:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 27 09:42:23 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 09:42:23 fir-md1-s1 kernel: LustreError: 21516:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 26 previous similar messages Jul 27 09:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 09:44:53 fir-md1-s1 kernel: Lustre: Skipped 944 previous similar messages Jul 27 09:47:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 09:47:03 fir-md1-s1 kernel: Lustre: Skipped 1526 previous similar messages Jul 27 09:51:59 fir-md1-s1 kernel: LustreError: 21616:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1025401450 x1631306774257488/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:26/0 lens 488/440 e 0 to 0 dl 1564246346 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 09:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 5c9f5376-a105-7e2f-1c52-759657f6fd7d (at 10.9.101.59@o2ib4), client will retry: rc -107 Jul 27 09:51:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 09:52:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 09:52:50 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 27 09:54:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 09:54:09 fir-md1-s1 kernel: Lustre: Skipped 492 previous similar messages Jul 27 09:54:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 09:54:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 09:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 09:57:19 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 27 10:03:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 10:03:42 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 10:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 10:05:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 10:07:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 10:07:42 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 10:08:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 10:08:39 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 27 10:13:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 10:13:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 10:16:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 10:16:36 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 20204:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564247825/real 1564247825] req@ffff8f0cc3313000 x1636747556527856/t0(0) o6->fir-OST0008-osc-MDT0000@10.0.10.101@o2ib7:28/4 lens 544/432 e 0 to 1 dl 1564247832 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 46563:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=18 reqQ=0 recA=29, svcEst=20, delay=6363 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 46563:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 23573:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0d9feb3c00 x1637151168867424/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:11/0 lens 2328/3264 e 0 to 0 dl 1564247831 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 31003:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f2b65b6e000 x1634122324980448/t0(0) o103->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: fir-OST0008-osc-MDT0000: Connection to fir-OST0008 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 23573:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 26 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 31003:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 4 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 23760:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f2f4267ec00 x1631581052786496/t0(0) o101->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:11/0 lens 584/1168 e 0 to 0 dl 1564247831 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 23760:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 46557:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f304dc71450 x1638886293914672/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:11/0 lens 488/440 e 0 to 0 dl 1564247831 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 27 10:17:13 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 10:17:13 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 15 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0f20cdb400 Jul 27 10:17:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 384 seconds Jul 27 10:17:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 138 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 21039:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f27449f8450 x1638906576443808/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564247832 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 21742:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3d86693850 x1638877342532736/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564247845 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 21742:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0f20cda400 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f430c213600 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4430b64400 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f449fa7a000 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f4430b64e00 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f31c079f800 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f430c214e00 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4430b65e00 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2e1cbf07-8a1f-f8ac-959f-a318bdca8802 (at 10.9.105.18@o2ib4), client will retry: rc = -110 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f323be53200 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2fa43d3000 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1612a96600 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0f20cde600 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1f084fe000 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0f20cdca00 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4430b64c00 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f084f9c00 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32c2442e00 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 21043:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=19 reqQ=0 recA=18, svcEst=20, delay=6299 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 21043:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 3 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 21043:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2f2217b850 x1638086585917280/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:12/0 lens 488/408 e 0 to 0 dl 1564247832 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 21043:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 71833:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0eace8ce00 x1633662394566432/t0(0) o37->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:11/0 lens 448/440 e 0 to 0 dl 1564247831 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:13 fir-md1-s1 kernel: LustreError: 71833:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 13 previous similar messages Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 71833:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f0eace8ce00 x1633662394566432/t0(0) o37->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:11/0 lens 448/408 e 0 to 0 dl 1564247831 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 10:17:13 fir-md1-s1 kernel: Lustre: 71833:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jul 27 10:17:14 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0e38abe850 x1639235623517936/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:5/0 lens 488/440 e 0 to 0 dl 1564247855 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:14 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 11 previous similar messages Jul 27 10:17:16 fir-md1-s1 kernel: LustreError: 46513:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+5s req@ffff8f2601fc4450 x1638250768850320/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:11/0 lens 488/440 e 0 to 0 dl 1564247831 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 27 10:17:16 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 10:17:16 fir-md1-s1 kernel: Lustre: 24213:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f0940eaa850 x1638086585917024/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:11/0 lens 488/440 e 0 to 0 dl 1564247831 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 10:17:16 fir-md1-s1 kernel: Lustre: 24213:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 10:17:16 fir-md1-s1 kernel: LustreError: 46513:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 21 previous similar messages Jul 27 10:17:17 fir-md1-s1 kernel: LustreError: 68193:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3959609c50 x1634125788638128/t0(0) o4->437db638-1a8f-d9e7-3d4a-b386602e77f0@10.9.102.35@o2ib4:29/0 lens 488/448 e 1 to 0 dl 1564247849 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:17 fir-md1-s1 kernel: LustreError: 68193:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 10:17:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 437db638-1a8f-d9e7-3d4a-b386602e77f0 (at 10.9.102.35@o2ib4), client will retry: rc = -110 Jul 27 10:17:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c4a74d2b-de98-9a37-7ebb-5f19657dadd1 (at 10.9.108.2@o2ib4), client will retry: rc = -110 Jul 27 10:17:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 10:17:21 fir-md1-s1 kernel: Lustre: 21742:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f395960a850 x1638250768850208/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564247845 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:21 fir-md1-s1 kernel: Lustre: 21742:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 10:17:22 fir-md1-s1 kernel: Lustre: 21364:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3d86694850 x1638899557539904/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564247846 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:22 fir-md1-s1 kernel: Lustre: 21364:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Jul 27 10:17:25 fir-md1-s1 kernel: LustreError: 46582:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f395960b850 x1638930269968352/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564247845 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:25 fir-md1-s1 kernel: LustreError: 46582:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 10:17:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 27 10:17:25 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 10:17:27 fir-md1-s1 kernel: Lustre: 24070:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0ed74f2050 x1638899557540112/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564247846 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 10:17:27 fir-md1-s1 kernel: Lustre: 24070:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 10:17:27 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f22178850 x1639510058241440/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564247852 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:27 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 27 10:17:31 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f2217b050 x1640014600735952/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564247856 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:31 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 27 10:17:32 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2384389050 x1638876248923904/t0(0) o3->97481f17-b98d-0828-17b9-32f14b205b6e@10.9.114.13@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564247852 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:32 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 23 previous similar messages Jul 27 10:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a687dd21-1bbe-233b-d907-3cc9986eac5f (at 10.9.103.28@o2ib4), client will retry: rc = -110 Jul 27 10:17:41 fir-md1-s1 kernel: LustreError: 21740:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3d86694450 x1634171842020256/t0(0) o4->2bf2f2dc-a45e-d531-35af-47b63ad114e3@10.9.109.48@o2ib4:9/0 lens 488/448 e 0 to 0 dl 1564247859 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:41 fir-md1-s1 kernel: LustreError: 21740:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 27 10:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2bf2f2dc-a45e-d531-35af-47b63ad114e3 (at 10.9.109.48@o2ib4), client will retry: rc = -110 Jul 27 10:17:41 fir-md1-s1 kernel: Lustre: 21740:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f3d86694450 x1634171842020256/t0(0) o4->2bf2f2dc-a45e-d531-35af-47b63ad114e3@10.9.109.48@o2ib4:9/0 lens 488/448 e 0 to 0 dl 1564247859 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 10:17:41 fir-md1-s1 kernel: Lustre: 21740:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 10:17:41 fir-md1-s1 kernel: LustreError: 46525:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+5s req@ffff8f283196fc50 x1631581052786720/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564247856 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:41 fir-md1-s1 kernel: LustreError: 46525:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 20 previous similar messages Jul 27 10:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 27 10:17:41 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 27 10:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7f7955a1-3f4a-c312-2133-102ef81ed310 (at 10.8.27.22@o2ib6) Jul 27 10:17:43 fir-md1-s1 kernel: Lustre: Skipped 916 previous similar messages Jul 27 10:17:51 fir-md1-s1 kernel: Lustre: 23627:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a84cbc200 x1631558358702400/t0(0) o101->a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0@10.8.8.32@o2ib6:26/0 lens 576/3264 e 1 to 0 dl 1564247876 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:17:51 fir-md1-s1 kernel: Lustre: 23627:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 27 10:18:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.102.43@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0ba8e7b180/0x5d9ee68c0b69b069 lrc: 3/0,0 mode: PR/PR res: [0x2c002bdde:0xc00c:0x0].0x0 bits 0x13/0x0 rrc: 443 type: IBT flags: 0x60200400000020 nid: 10.9.102.43@o2ib4 remote: 0xdcbab3fdca3fc55d expref: 5859 pid: 23556 timeout: 3362944 lvb_type: 0 Jul 27 10:18:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 27 10:18:06 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f15350bd100 x1634125788674064/t0(0) o101->437db638-1a8f-d9e7-3d4a-b386602e77f0@10.9.102.35@o2ib4:5/0 lens 576/536 e 0 to 0 dl 1564247885 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 10:18:06 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 10:18:09 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26e08bd100 x1634528251533936/t0(0) o101->2ee51d45-426d-bbd9-5b4f-485a0917e8b9@10.8.17.18@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1564247894 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:18:09 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Jul 27 10:18:53 fir-md1-s1 kernel: Lustre: 18781:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f07a008a050 x1637107003707040/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564247938 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 10:20:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.28.11@o2ib6, removing former export from same NID Jul 27 10:20:12 fir-md1-s1 kernel: Lustre: Skipped 301 previous similar messages Jul 27 10:26:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 10:26:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 10:27:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 10:27:19 fir-md1-s1 kernel: Lustre: Skipped 664 previous similar messages Jul 27 10:28:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 10:28:33 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Jul 27 10:30:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 10:30:18 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 10:37:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 10:37:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 27 10:38:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 10:38:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 10:39:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 10:39:19 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 27 10:40:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 27 10:40:57 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 27 10:49:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 10:49:03 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 10:49:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 10:49:27 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 27 10:51:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 10:51:06 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 27 10:53:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 10:53:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 10:59:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 10:59:34 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 27 10:59:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 10:59:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 11:01:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:01:15 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 11:06:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 11:06:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 11:09:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 11:09:34 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 27 11:10:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 11:10:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 11:12:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:12:33 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 11:17:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 11:17:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 11:19:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 11:19:41 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 27 11:20:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 11:20:46 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 11:22:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:22:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 27 11:30:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 11:30:09 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 27 11:30:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 11:30:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 11:30:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 11:30:58 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 11:33:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:33:21 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 27 11:40:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 11:40:12 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 11:40:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 11:40:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 11:42:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 11:42:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 11:45:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:45:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 11:50:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 11:50:48 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 11:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 11:51:28 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 11:55:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 11:55:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 11:56:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 11:56:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 12:01:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 12:01:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 12:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 12:01:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 12:07:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 27 12:07:22 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 12:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 12:11:21 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 12:12:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 12:12:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 12:12:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 12:12:05 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 12:19:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 12:19:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 12:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 12:21:44 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 27 12:22:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 12:22:07 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 12:22:39 fir-md1-s1 kernel: Lustre: 21683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a18455450 x1639156281901504/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564255364 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 12:23:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 12:23:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 12:30:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 12:30:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 12:31:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 12:31:47 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 27 12:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 12:32:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 12:38:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 12:38:52 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 12:41:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 12:41:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 12:41:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 12:41:49 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 12:42:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 12:42:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 12:45:55 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 27 12:45:55 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Jul 27 12:49:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 12:49:59 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 12:51:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 12:51:25 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 27 12:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 12:51:58 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 27 12:52:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 12:52:29 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 13:01:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:01:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 13:02:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 13:02:03 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 13:02:38 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257751/real 1564257751] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257758 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 13:02:38 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 27 13:02:45 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257758/real 1564257758] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257765 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:02:52 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257765/real 1564257765] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257772 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:02:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 13:02:53 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 13:02:56 fir-md1-s1 kernel: Lustre: 23676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2692f81b00 x1636469117691760/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:1/0 lens 480/568 e 0 to 0 dl 1564257781 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:02:59 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257772/real 1564257772] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257779 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:03:06 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257779/real 1564257779] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257786 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:03:06 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 27 13:03:20 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257793/real 1564257793] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257800 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:03:20 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 27 13:03:27 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-14), not sending early reply req@ffff8f1616706600 x1636469118151952/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:2/0 lens 480/568 e 0 to 0 dl 1564257812 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:03:42 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564257814/real 1564257814] req@ffff8f28de4c9200 x1636747638223440/t0(0) o106->fir-MDT0000@10.8.15.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564257821 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:03:42 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jul 27 13:03:56 fir-md1-s1 kernel: LustreError: 23665:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.3@o2ib6) returned error from glimpse AST (req@ffff8f28de4c9200 x1636747638223440 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2d7d8f69c0/0x5d9ee68cb45efca5 lrc: 4/0,0 mode: PW/PW res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.15.3@o2ib6 remote: 0xc36b1974818f6889 expref: 44 pid: 23584 timeout: 0 lvb_type: 0 Jul 27 13:03:56 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.15.3@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Jul 27 13:03:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 117s: evicting client at 10.8.15.3@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2d7d8f69c0/0x5d9ee68cb45efca5 lrc: 4/0,0 mode: PW/PW res: [0x2000222f5:0x2c5:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.15.3@o2ib6 remote: 0xc36b1974818f6889 expref: 45 pid: 23584 timeout: 0 lvb_type: 0 Jul 27 13:05:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 13:05:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 13:06:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 957c1ad0-d547-b44d-0f14-5f92c3213a3d (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee5543000, cur 1564257960 expire 1564257810 last 1564257733 Jul 27 13:12:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:12:18 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 13:12:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 13:12:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 27 13:13:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 13:13:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 27 13:17:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 13:17:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=51 reqQ=0 recA=29, svcEst=1, delay=9478 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2537e1ec50 x1638086918828832/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564258767 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 46524:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.6@o2ib4: deadline 6:4s ago req@ffff8f210d326850 x1638831641085344/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564258767 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 46524:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 46580:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff8f161e1d7c50 x1638881165220768/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:27/0 lens 488/0 e 0 to 0 dl 1564258767 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 25080:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff8f2254b60c50 x1631308323824016/t0(0) o103->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 46580:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 25080:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 8 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 4 seconds Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 16 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (10): c: 5, oc: 1, rc: 8 Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 16 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 23690:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564258761/real 1564258761] req@ffff8f403bd3fb00 x1636747646211936/t0(0) o104->fir-MDT0000@10.9.101.59@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564258768 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 23690:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 46568:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -4+4s req@ffff8f2414ff0850 x1639510531541280/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564258767 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 46568:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18aa684c00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ec9dac00 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e768400 Jul 27 13:19:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 4 seconds Jul 27 13:19:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 9 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f18aa684400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f07f3bcfc00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3946636800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44ec9dd200 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3f0728a200 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f38a4b80a00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e76a200 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e76b400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44ec9de400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f07f3bce000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f07f3bcd400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2f6d86dc00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f336f68d000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e44f0ee00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3f07289800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f445e768200 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2ee6587600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f44ec9d8600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0789e79400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2df53c9600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2ee6587600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2b1628cc00 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 7e2f1b77-e605-f279-b45d-e428b3d96daf (at 10.9.101.3@o2ib4), client will retry: rc = -110 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27c392a800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f336f68c800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18442a6800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f56ba1000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f6d86dc00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e768200 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0799218600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38a4b82800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18aa685c00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d30eee600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18aa683400 Jul 27 13:19:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.210@o2ib7: connected Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3647ab8400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0e0330a400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0bc6f24e00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2fb95a5800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ec9ddc00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c222400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df2d46400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c223400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0743735600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0743734400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0743737a00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44e5b74800 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f073b0400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2697faee00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3946635c00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2697faa000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4506c15600 Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 46511:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.9.101.59@o2ib4 from 10.0.10.51@o2ib7 Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 46511:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 12 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f217ee10400 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c227e00 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c222000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f18442a1600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f336f68f000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1e44f0a000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 44040:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1c7f65f000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 46510:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f336f68f000 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 21743:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f4506c15600 Jul 27 13:19:32 fir-md1-s1 kernel: LustreError: 20500:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f36d4f9d600 Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 23728:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.7@o2ib6 from 10.0.10.51@o2ib7 Jul 27 13:19:32 fir-md1-s1 kernel: LNetError: 23728:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 67 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21890:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=10, delay=8829 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21890:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 14 previous similar messages Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21890:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f4214510300 x1638280945477024/t0(0) o37->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:27/0 lens 448/408 e 0 to 0 dl 1564258767 ref 1 fl Complete:/0/0 rc 0/0 Jul 27 13:19:32 fir-md1-s1 kernel: Lustre: 21890:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 184 previous similar messages Jul 27 13:19:36 fir-md1-s1 kernel: LustreError: 24070:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+9s req@ffff8f13049e6050 x1638877645323776/t0(0) o3->c4566649-5001-d956-15cb-934d725d7f29@10.9.113.11@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564258767 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 13:19:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 59f5c312-adc4-b4a9-05e0-8c37d188c47f (at 10.9.112.13@o2ib4), client will retry: rc -110 Jul 27 13:19:36 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 27 13:19:36 fir-md1-s1 kernel: Lustre: 42895:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:9s); client may timeout. req@ffff8f20b1ace050 x1639298856164752/t0(0) o3->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564258767 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 13:19:36 fir-md1-s1 kernel: Lustre: 42895:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 173 previous similar messages Jul 27 13:19:36 fir-md1-s1 kernel: LustreError: 24070:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 78 previous similar messages Jul 27 13:19:38 fir-md1-s1 kernel: Lustre: 23690:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564258771/real 1564258771] req@ffff8f403bd3fb00 x1636747646211936/t0(0) o104->fir-MDT0000@10.9.101.59@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564258778 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 13:19:38 fir-md1-s1 kernel: Lustre: 23690:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages Jul 27 13:19:46 fir-md1-s1 kernel: Lustre: 21716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2537e1b050 x1637107219337760/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564258791 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:19:51 fir-md1-s1 kernel: LustreError: 24568:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f29e626ec50 x1638881165221104/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1564258791 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 13:19:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 27 13:19:51 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 13:19:51 fir-md1-s1 kernel: LustreError: 24568:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 27 13:19:56 fir-md1-s1 kernel: Lustre: 21311:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-15), not sending early reply req@ffff8f143cdb2d00 x1637152381878368/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:1/0 lens 600/3264 e 0 to 0 dl 1564258801 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:19:56 fir-md1-s1 kernel: Lustre: 21311:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 27 13:20:03 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3496215400 x1636429275159072/t0(0) o36->304180e1-aa68-a4a4-ed4c-9536f53351a5@10.8.1.21@o2ib6:8/0 lens 504/2888 e 0 to 0 dl 1564258808 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:20:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 46s: evicting client at 10.9.101.59@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1f46d6e780/0x5d9ee68cc62bd96d lrc: 3/0,0 mode: PW/PW res: [0x200029bf6:0x350c:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.101.59@o2ib4 remote: 0x432adc782664c675 expref: 16 pid: 20460 timeout: 3373867 lvb_type: 0 Jul 27 13:20:07 fir-md1-s1 kernel: LustreError: 23690:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f45182d1800 ns: mdt-fir-MDT0000_UUID lock: ffff8f0a15d40fc0/0x5d9ee68cc62c6b8d lrc: 3/0,0 mode: PW/PW res: [0x200029bf6:0x350c:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.9.101.59@o2ib4 remote: 0x432adc782665155f expref: 11 pid: 23690 timeout: 0 lvb_type: 0 Jul 27 13:20:07 fir-md1-s1 kernel: Lustre: 23690:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:16s); client may timeout. req@ffff8f403bd3e000 x1631308323866032/t0(0) o101->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:21/0 lens 480/536 e 0 to 0 dl 1564258791 ref 1 fl Complete:/0/0 rc -107/-107 Jul 27 13:20:07 fir-md1-s1 kernel: Lustre: 23690:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 27 13:20:08 fir-md1-s1 kernel: LustreError: 23749:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2629c34500 x1636747646289584/t0(0) o104->fir-MDT0002@10.9.102.40@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 27 13:22:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:22:37 fir-md1-s1 kernel: Lustre: Skipped 276 previous similar messages Jul 27 13:22:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 13:22:37 fir-md1-s1 kernel: Lustre: Skipped 968 previous similar messages Jul 27 13:23:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 13:23:47 fir-md1-s1 kernel: Lustre: Skipped 679 previous similar messages Jul 27 13:31:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 13:31:25 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 27 13:32:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:32:43 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 27 13:32:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 13:32:43 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 27 13:34:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 13:34:03 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 13:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 13:43:22 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 27 13:44:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:44:31 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 13:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 13:44:34 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 13:47:13 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0b3ae71450 x1638831681428672/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564260438 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 13:50:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 13:50:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 13:53:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 13:53:29 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 27 13:54:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 13:54:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 13:54:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 13:54:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 14:01:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 14:01:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 14:03:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 14:03:32 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 14:04:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 14:04:46 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 14:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 14:05:52 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 14:11:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 14:11:36 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 14:13:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 14:13:35 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 14:15:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 14:15:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 14:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 14:16:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 14:20:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 49530de5-f172-5bb3-a0d3-bd0ce56d3339 (at 10.8.7.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe4eb400, cur 1564262443 expire 1564262293 last 1564262216 Jul 27 14:20:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 14:23:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 14:23:51 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 14:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 14:26:05 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 14:26:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 14:26:53 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 14:28:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 14:33:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 14:33:52 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 27 14:36:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 14:36:57 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 14:37:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 14:37:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 14:38:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 14:38:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 14:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 14:44:24 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 14:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 14:47:24 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 14:47:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 14:47:49 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 27 14:54:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 14:54:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 14:57:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 14:57:03 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 27 14:57:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 14:57:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 14:59:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 27 14:59:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 27 15:04:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 15:04:29 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 27 15:08:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 15:08:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 15:09:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 15:09:09 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 15:11:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 15:11:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 15:14:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 15:14:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 27 15:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 15:19:57 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 15:21:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 15:21:57 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 27 15:22:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 15:22:21 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 15:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 15:24:56 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 27 15:26:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f179b06cc00, cur 1564266368 expire 1564266218 last 1564266141 Jul 27 15:26:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 15:30:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 15:30:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 15:34:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 15:34:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 15:35:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 15:35:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 15:36:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 15:36:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 15:41:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 15:41:47 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 27 15:45:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 15:45:18 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 27 15:46:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 15:46:24 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 27 15:46:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 15:46:55 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 15:52:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 15:52:00 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 15:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 15:55:28 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 15:58:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 15:58:16 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 16:00:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:00:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 16:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 16:02:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 27 16:05:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 16:05:45 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 27 16:08:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 16:08:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 16:11:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:11:22 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 16:12:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 16:12:23 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 16:15:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 16:15:54 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 16:18:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 16:18:45 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 27 16:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b1ea8bac-bbd4-281f-5e6b-8cf3be7d5b02 (at 10.8.11.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30a4b25c00, cur 1564269641 expire 1564269491 last 1564269414 Jul 27 16:20:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b1ea8bac-bbd4-281f-5e6b-8cf3be7d5b02 (at 10.8.11.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f033c6000, cur 1564269644 expire 1564269494 last 1564269417 Jul 27 16:20:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 16:22:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 16:22:43 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 16:23:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:23:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 16:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 16:26:02 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 27 16:29:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 16:29:49 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 16:32:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 16:32:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 16:33:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:33:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 16:36:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 16:36:40 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 16:41:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 16:41:05 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 27 16:43:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 16:43:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 16:46:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 16:46:52 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 16:48:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:48:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 16:52:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 16:52:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 27 16:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 16:54:17 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 16:55:45 fir-md1-s1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0003 address=0xfffffffdf8230000 flags=0x0008] Jul 27 16:57:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 16:57:33 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 27 16:59:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 16:59:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 17:03:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:03:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 27 17:04:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 17:04:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 17:07:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 17:07:41 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 27 17:12:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 17:12:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 17:13:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:13:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 27 17:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 17:15:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 17:17:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 17:17:42 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 27 17:23:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:23:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 17:24:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 17:24:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 17:25:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 17:25:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 17:28:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 17:28:37 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 17:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:34:03 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 17:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 17:36:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 17:38:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 17:38:39 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 27 17:44:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:44:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 17:46:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 17:46:11 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 27 17:48:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 17:48:44 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 27 17:50:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 17:50:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 17:53:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 17:55:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d8966a4a-4d47-9263-5c59-def341457a39 (at 10.8.11.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520a61400, cur 1564275356 expire 1564275206 last 1564275129 Jul 27 17:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 17:56:20 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 17:57:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 17:57:13 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 17:58:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 17:58:45 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 27 18:01:26 fir-md1-s1 kernel: Lustre: 24213:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0522a97050 x1631633342691312/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:1/0 lens 488/440 e 1 to 0 dl 1564275691 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 18:02:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae1c09c00, cur 1564275739 expire 1564275589 last 1564275512 Jul 27 18:02:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 27 18:05:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 500d79fb-6050-0ac2-c2b9-bb5a2cdcafd9 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2765eee000, cur 1564275908 expire 1564275758 last 1564275681 Jul 27 18:05:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:05:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 18:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 18:06:22 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 18:08:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 18:08:31 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 18:08:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 18:08:48 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 27 18:16:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 18:16:37 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 18:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 18:19:07 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 27 18:19:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 27 18:19:45 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 18:20:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f0428d50-4566-1843-edbe-bdc8dbd6d1bb (at 10.8.19.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24fe4e9c00, cur 1564276836 expire 1564276686 last 1564276609 Jul 27 18:20:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 18:26:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 18:26:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 18:27:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:27:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 18:28:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 06a6044d-3fbb-8d92-2de5-ca0f5efa3681 (at 10.8.31.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ea3b5800, cur 1564277286 expire 1564277136 last 1564277059 Jul 27 18:28:06 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 27 18:29:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 18:29:27 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 27 18:29:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 18:29:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 27 18:31:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:31:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2627d204-0bac-227d-c734-11b61ebefe17 (at 10.8.21.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f4898000, cur 1564277516 expire 1564277366 last 1564277289 Jul 27 18:31:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 18:35:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:35:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 18:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 18:37:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 18:37:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8ffa168b-114f-3abd-06c6-50410b37ddf7 (at 10.8.23.11@o2ib6) in 191 seconds. I think it's dead, and I am evicting it. exp ffff8f34e9c92400, cur 1564277878 expire 1564277728 last 1564277687 Jul 27 18:37:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 18:38:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5f3e0ce7-a67a-931b-9f13-5759664ab6ae (at 10.8.23.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250180b000, cur 1564277914 expire 1564277764 last 1564277687 Jul 27 18:38:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 18:39:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 18:39:28 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 27 18:41:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 18:41:23 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 27 18:47:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d4d733ff-8d4b-d8de-bbc6-b5ae7cc529ba (at 10.8.12.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2507824400, cur 1564278473 expire 1564278323 last 1564278246 Jul 27 18:47:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 18:47:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:47:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 18:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 18:47:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 18:49:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6ea7672-8868-9f5a-78cf-51eceeedc9da (at 10.8.23.14@o2ib6) in 226 seconds. I think it's dead, and I am evicting it. exp ffff8f339f39c800, cur 1564278549 expire 1564278399 last 1564278323 Jul 27 18:49:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 18:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 18:49:35 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 18:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6b93cfa8-f9b6-53af-0489-a4db82539e08 (at 10.8.23.5@o2ib6) in 226 seconds. I think it's dead, and I am evicting it. exp ffff8f2502d55c00, cur 1564278625 expire 1564278475 last 1564278399 Jul 27 18:50:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 18:51:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 18:51:36 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 18:51:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 01bbc775-4337-911b-0587-f02509a82a71 (at 10.8.22.27@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8f252173d800, cur 1564278701 expire 1564278551 last 1564278498 Jul 27 18:51:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 18:56:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 18:56:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 18:58:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 18:58:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 27 18:59:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 18:59:45 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 27 19:01:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 19:01:36 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 27 19:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 19:09:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 19:09:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 19:09:51 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 27 19:09:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24692c0000, cur 1564279793 expire 1564279643 last 1564279566 Jul 27 19:09:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 19:10:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 19:10:19 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 19:11:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 19:11:41 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 27 19:19:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 27 19:19:46 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 19:19:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 19:19:54 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 27 19:21:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 19:21:43 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 19:22:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 19:22:29 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 27 19:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 19:30:06 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 27 19:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 19:31:02 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 19:33:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 19:33:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 19:34:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 27 19:34:13 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 27 19:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 19:40:11 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 19:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 19:41:26 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 19:43:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 19:43:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 19:44:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 19:44:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 27 19:47:20 fir-md1-s1 kernel: Lustre: 21451:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0bfbc6f050 x1631312475143120/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:25/0 lens 488/4536 e 1 to 0 dl 1564282045 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 19:47:26 fir-md1-s1 kernel: Lustre: 35241:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0bfbc6f050 x1631312475143120/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:25/0 lens 488/632 e 1 to 0 dl 1564282045 ref 1 fl Complete:/0/0 rc 217/217 Jul 27 19:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 19:50:12 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 27 19:51:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 19:51:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 19:54:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 19:54:27 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 27 20:00:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 20:00:38 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 20:02:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 20:02:09 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 20:04:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:04:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 20:04:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:04:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 20:05:32 fir-md1-s1 kernel: Lustre: 23692:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f31960a6600 x1639438943758960/t0(0) o101->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:7/0 lens 1768/3288 e 1 to 0 dl 1564283137 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:10:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 20:10:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 27 20:12:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 20:12:42 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 27 20:14:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:14:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 20:15:37 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0cf93a8450 x1638900867690864/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564283742 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:18:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:18:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 27 20:18:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:18:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 20:20:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 27 20:20:46 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 27 20:22:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 20:22:54 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 27 20:27:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:27:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 20:28:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:28:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 27 20:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 20:31:37 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 27 20:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 20:33:19 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 27 20:39:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:39:48 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 20:40:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:40:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 20:41:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 20:41:39 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 20:44:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 20:44:04 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 27 20:44:25 fir-md1-s1 kernel: LustreError: 23036:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f38674a1c50 x1638251819124736/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:25/0 lens 488/440 e 0 to 0 dl 1564285465 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 2f627314-68e3-35d2-70d7-0cd2604dd048 (at 10.9.115.4@o2ib4), client will retry: rc -110 Jul 27 20:44:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 27 20:44:25 fir-md1-s1 kernel: LustreError: 23036:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 20:44:27 fir-md1-s1 kernel: LustreError: 24571:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f27dffe5050 x1631589657594704/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:20/0 lens 488/440 e 0 to 0 dl 1564285490 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 27 20:44:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6), client will retry: rc = -110 Jul 27 20:44:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 20:44:32 fir-md1-s1 kernel: LustreError: 46587:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1f5b7fa050 x1638887936388816/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:19/0 lens 488/440 e 0 to 0 dl 1564285489 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:32 fir-md1-s1 kernel: LustreError: 46587:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 27 20:44:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 27 20:44:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 27 20:44:32 fir-md1-s1 kernel: Lustre: 20511:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564285465/real 1564285465] req@ffff8f165e531e00 x1636747910781312/t0(0) o106->fir-MDT0002@10.8.17.25@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564285472 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 20:44:34 fir-md1-s1 kernel: Lustre: 46517:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f21c4dc50 x1633754550415024/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:9/0 lens 488/440 e 1 to 0 dl 1564285479 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:35 fir-md1-s1 kernel: Lustre: 46528:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f21c48450 x1639837452881008/t0(0) o4->6052c41b-9004-bcc3-dbad-bff4bc2f2f04@10.8.14.5@o2ib6:10/0 lens 520/456 e 1 to 0 dl 1564285480 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:38 fir-md1-s1 kernel: LustreError: 22974:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2f21c4dc50 x1633754550415024/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:9/0 lens 488/440 e 1 to 0 dl 1564285479 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:40 fir-md1-s1 kernel: LustreError: 21039:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 15+0s req@ffff8f2f21c48450 x1639837452881008/t0(0) o4->6052c41b-9004-bcc3-dbad-bff4bc2f2f04@10.8.14.5@o2ib6:10/0 lens 520/456 e 1 to 0 dl 1564285480 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6052c41b-9004-bcc3-dbad-bff4bc2f2f04 (at 10.8.14.5@o2ib6), client will retry: rc = -110 Jul 27 20:44:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 20:44:40 fir-md1-s1 kernel: Lustre: 20991:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1c6234d050 x1639960307448304/t0(0) o37->f514cc7a-9bbf-6a9c-dfda-7e21d4d17fbe@10.8.9.9@o2ib6:15/0 lens 448/440 e 1 to 0 dl 1564285485 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:45 fir-md1-s1 kernel: LustreError: 27063:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1c6234d050 x1639960307448304/t0(0) o37->f514cc7a-9bbf-6a9c-dfda-7e21d4d17fbe@10.8.9.9@o2ib6:15/0 lens 448/440 e 1 to 0 dl 1564285485 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:50 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f21f3db7c50 x1636424859886704/t0(0) o4->7f5b8d8c-996c-1887-f76d-12c3566ba896@10.8.1.17@o2ib6:25/0 lens 504/448 e 0 to 0 dl 1564285495 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:52 fir-md1-s1 kernel: LustreError: 44034:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f21f3db7c50 x1636424859886704/t0(0) o4->7f5b8d8c-996c-1887-f76d-12c3566ba896@10.8.1.17@o2ib6:25/0 lens 504/448 e 0 to 0 dl 1564285495 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 20:44:52 fir-md1-s1 kernel: LustreError: 44034:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 20:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 7f5b8d8c-996c-1887-f76d-12c3566ba896 (at 10.8.1.17@o2ib6), client will retry: rc = -110 Jul 27 20:49:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:49:51 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 27 20:50:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 20:50:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 20:51:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 20:51:44 fir-md1-s1 kernel: Lustre: Skipped 178 previous similar messages Jul 27 20:54:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 20:54:21 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 27 20:59:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 20:59:57 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 27 21:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 21:01:49 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 27 21:04:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 21:04:33 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 21:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 21:10:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:10:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 21:10:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 21:11:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 21:11:54 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 27 21:15:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 21:15:18 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 21:20:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 21:20:20 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 27 21:21:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:21:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 21:22:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 21:22:02 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 27 21:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 21:25:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 21:30:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 21:30:49 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 27 21:31:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:31:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 21:32:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 21:32:03 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 27 21:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 21:35:39 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 27 21:41:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 21:41:54 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 27 21:42:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 21:42:04 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Jul 27 21:44:07 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 27 21:44:07 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 4 previous similar messages Jul 27 21:44:07 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.210@o2ib7 (5): c: 4, oc: 0, rc: 7 Jul 27 21:44:07 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 4 previous similar messages Jul 27 21:44:07 fir-md1-s1 kernel: LNetError: 24070:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 21:44:07 fir-md1-s1 kernel: LustreError: 46586:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f373c0d8a00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f052c69be00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f147b56ce00 Jul 27 21:44:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 18 seconds Jul 27 21:44:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 34 previous similar messages Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f052c698a00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34fdf56600 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f077125d400 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0f07bc8000 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3df66e8000 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f09cb9a0400 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f052c69fc00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1c553cee00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f43ddb61a00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f147b56a400 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f399c6b6200 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f20294b7a00 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f147b56b800 Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2af2628200 Jul 27 21:44:08 fir-md1-s1 kernel: LNetError: 24070:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 43 previous similar messages Jul 27 21:44:08 fir-md1-s1 kernel: LustreError: 24070:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0e74b96400 Jul 27 21:44:09 fir-md1-s1 kernel: LustreError: 21709:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0957e33c50 x1638901061456576/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:22/0 lens 488/440 e 1 to 0 dl 1564289062 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4), client will retry: rc -110 Jul 27 21:44:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 27 21:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a890aaf8-05fd-cdba-39fc-201f06d6890d (at 10.9.108.55@o2ib4), client will retry: rc = -110 Jul 27 21:44:10 fir-md1-s1 kernel: LustreError: 46539:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3946c8a850 x1638937029130784/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:22/0 lens 488/440 e 1 to 0 dl 1564289062 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:10 fir-md1-s1 kernel: LustreError: 46539:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 61 previous similar messages Jul 27 21:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 294f669a-76d8-9cb4-d54f-e33a51dba159 (at 10.9.112.11@o2ib4), client will retry: rc -110 Jul 27 21:44:10 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 27 21:44:12 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f247080a050 x1631571090049728/t0(0) o4->6c1d7a0f-3fbd-e272-bbab-46b21ff978f8@10.9.102.69@o2ib4:27/0 lens 504/448 e 1 to 0 dl 1564289067 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:12 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 21:44:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6c1d7a0f-3fbd-e272-bbab-46b21ff978f8 (at 10.9.102.69@o2ib4), client will retry: rc = -110 Jul 27 21:44:16 fir-md1-s1 kernel: LustreError: 25997:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f247080a450 x1631542443249552/t0(0) o4->786945fd-d0e4-9127-4dce-4fcd2bed9b64@10.9.105.24@o2ib4:24/0 lens 488/448 e 1 to 0 dl 1564289064 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4), client will retry: rc = -110 Jul 27 21:44:18 fir-md1-s1 kernel: Lustre: 46547:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f247080e450 x1631559346991680/t0(0) o3->cead7d10-a870-f1c4-8ddf-757d1d8e738a@10.9.104.67@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564289063 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:23 fir-md1-s1 kernel: LustreError: 21390:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 16+0s req@ffff8f247080e450 x1631559346991680/t0(0) o3->cead7d10-a870-f1c4-8ddf-757d1d8e738a@10.9.104.67@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564289063 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 21:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with cead7d10-a870-f1c4-8ddf-757d1d8e738a (at 10.9.104.67@o2ib4), client will retry: rc -110 Jul 27 21:44:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 21:46:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 21:46:09 fir-md1-s1 kernel: Lustre: Skipped 127 previous similar messages Jul 27 21:52:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 21:52:20 fir-md1-s1 kernel: Lustre: Skipped 173 previous similar messages Jul 27 21:52:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 21:52:28 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 27 21:53:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:53:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 27 21:54:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:54:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 21:57:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 21:57:08 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 21:57:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 21:57:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 27 22:02:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 22:02:21 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 27 22:02:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 22:02:29 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 27 22:05:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 22:05:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 22:07:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 22:07:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 27 22:12:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 22:12:23 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 22:12:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 22:12:32 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 27 22:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 22:17:16 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 21682:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=21 reqQ=0 recA=25, svcEst=1, delay=7604 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 21682:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f141cdc2450 x1639512052639856/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/0 e 0 to 0 dl 1564291064 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 81719:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.5@o2ib4: deadline 6:3s ago req@ffff8f141cdc2450 x1639512052639856/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/0 e 0 to 0 dl 1564291064 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 81719:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 61 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f141cdc2450 x1639512052639856/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/0 e 0 to 0 dl 1564291064 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 23556:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564291058/real 1564291058] req@ffff8f0ca8098c00 x1636747951823984/t0(0) o106->fir-MDT0000@10.9.101.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564291065 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 21311:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564291058/real 1564291058] req@ffff8f14ae4c4e00 x1636747951823872/t0(0) o106->fir-MDT0000@10.9.101.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564291065 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 55490:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 9s req@ffff8f1650be9450 x1639231316308688/t0(0) o400->64eca682-1834-2c93-f474-0d81bb4ed1e8@10.9.103.20@o2ib4:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 55490:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 22 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 13921:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f12f6c3d850 x1639237302338704/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564291064 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 13921:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1edd59f400 Jul 27 22:17:48 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2345d82200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 27481:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f175035a400 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 27481:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.8.8.23@o2ib6 x1637882812961504 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1edd59fc00 Jul 27 22:17:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 3 seconds Jul 27 22:17:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 16 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jul 27 22:17:48 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.210@o2ib7 (0): c: 1, oc: 0, rc: 8 Jul 27 22:17:48 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f31fc619c00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2c9d27d200 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1edd59b200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0799218200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4364bb9800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10eae5cc00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f31fc61e200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f24fe5f5800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f288164aa00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f079921b000 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2f6d86de00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27c392f200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2c9d27fa00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2dc4fd4800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1edd59ea00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b00e7ec00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f73a9a400 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2b01b6ae00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f73a98e00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1859ff4200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f20dcde5c00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1457b0e000 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f42f803aa00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0638bd6800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ba1a53400 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a37b6b600 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2fb95a0600 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f44f0e0b400 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f6d868200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f079921de00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f10eae5a600 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4364bb9800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e76b800 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f31fc618c00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1859ff2a00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f288164e200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f27c392ba00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4364bbde00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f31fc619800 Jul 27 22:17:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.210@o2ib7: connected Jul 27 22:17:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 2 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: fir-OST0003-osc-MDT0002: Connection to fir-OST0003 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1859ff7200 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0789e7f600 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f73a9ba00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1457b0f000 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2fd474ae00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f445e76da00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0e74b95000 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e98785400 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66ecc00 Jul 27 22:17:48 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1457b0a600 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 29833:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=38 reqQ=0 recA=18, svcEst=20, delay=7913 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 29833:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 7 previous similar messages Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 29833:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f33729fb450 x1639299975243712/t0(0) o3->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:15/0 lens 488/0 e 0 to 0 dl 1564291065 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 22:17:48 fir-md1-s1 kernel: Lustre: 29833:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 55 previous similar messages Jul 27 22:17:50 fir-md1-s1 kernel: LustreError: 21716:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f17c39e5850 x1631581695389616/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 27 22:17:50 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 27 22:17:50 fir-md1-s1 kernel: LustreError: 21716:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 22:17:51 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+7s req@ffff8f4339e0ec50 x1639299975242416/t0(0) o3->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564291064 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:51 fir-md1-s1 kernel: LustreError: 21532:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+7s req@ffff8f370d6e4c50 x1639237302338832/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564291064 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:51 fir-md1-s1 kernel: LustreError: 21532:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 35 previous similar messages Jul 27 22:17:51 fir-md1-s1 kernel: Lustre: 21532:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f370d6e4c50 x1639237302338832/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564291064 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:17:51 fir-md1-s1 kernel: Lustre: 21532:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 84 previous similar messages Jul 27 22:17:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 27 22:17:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 27 22:17:51 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 27 22:17:52 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f170ed2ac50 x1637404229791408/t0(0) o3->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:52 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 22:17:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 27 22:17:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 22:17:53 fir-md1-s1 kernel: Lustre: 21716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f24720ae850 x1640015679238960/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:28/0 lens 488/440 e 1 to 0 dl 1564291078 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:54 fir-md1-s1 kernel: Lustre: 21311:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564291067/real 1564291067] req@ffff8f14ae4c4e00 x1636747951823872/t0(0) o106->fir-MDT0000@10.9.101.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564291074 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 22:17:54 fir-md1-s1 kernel: LustreError: 21364:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3927a16050 x1638087893229360/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:54 fir-md1-s1 kernel: LustreError: 21364:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 22:17:54 fir-md1-s1 kernel: Lustre: 21311:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages Jul 27 22:17:54 fir-md1-s1 kernel: Lustre: 46594:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2340ea5850 x1638871843414480/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564291079 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:54 fir-md1-s1 kernel: Lustre: 46594:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 27 22:17:58 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f209269f050 x1631589751223760/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:28/0 lens 488/440 e 1 to 0 dl 1564291078 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:17:58 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 27 22:17:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 27 22:17:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 22:18:00 fir-md1-s1 kernel: Lustre: 46529:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f24720ae450 x1638937068991120/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564291078 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:18:00 fir-md1-s1 kernel: Lustre: 46529:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jul 27 22:18:00 fir-md1-s1 kernel: LustreError: 46567:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 12+1s req@ffff8f1a2aa44450 x1638901134304608/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564291079 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:00 fir-md1-s1 kernel: LustreError: 46567:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 12 previous similar messages Jul 27 22:18:02 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564291067/real 1564291067] req@ffff8f3268ccaa00 x1636747951824416/t0(0) o106->fir-MDT0000@10.9.101.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564291082 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 22:18:02 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 27 22:18:02 fir-md1-s1 kernel: Lustre: 46567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2294d63850 x1638832404479936/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564291087 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:02 fir-md1-s1 kernel: Lustre: 46567:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 22:18:07 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2a137c4850 x1637107901352064/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564291087 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 27 22:18:07 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 27 22:18:07 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 27 22:18:09 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2ec03b8c50 x1638884197980272/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564291097 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:09 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 27 22:18:10 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f141cdc7850 x1631634022424080/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:18:10 fir-md1-s1 kernel: Lustre: 21485:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 22:18:15 fir-md1-s1 kernel: Lustre: 46569:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:7s); client may timeout. req@ffff8f2340ea5050 x1638901134303888/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:18:15 fir-md1-s1 kernel: Lustre: 46569:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 27 22:18:15 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20b7d78f00 x1631866755216416/t0(0) o101->9c540990-8457-458f-eb50-06c483166dd3@10.8.8.21@o2ib6:20/0 lens 600/3264 e 0 to 0 dl 1564291100 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:15 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Jul 27 22:18:16 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+8s req@ffff8f33729ff450 x1631634022423664/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564291088 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:16 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 11 previous similar messages Jul 27 22:18:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.17.25@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3969da72c0/0x5d9ee68f12a64030 lrc: 3/0,0 mode: PR/PR res: [0x2c002c595:0x1fa0a:0x0].0x0 bits 0x1b/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.17.25@o2ib6 remote: 0x87bb6b4bea88d5a3 expref: 5976 pid: 23590 timeout: 3406158 lvb_type: 0 Jul 27 22:18:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 27 22:18:18 fir-md1-s1 kernel: LustreError: 97669:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1a8bf54800 x1636747952133952/t0(0) o104->fir-MDT0002@10.8.17.25@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 27 22:18:19 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d46018900 x1637400162176352/t0(0) o101->5890eb4b-33c1-2ed3-4d2b-60df28cbaad8@10.8.8.25@o2ib6:24/0 lens 576/3264 e 0 to 0 dl 1564291104 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:18:19 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 27 22:21:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ed5b0000, cur 1564291262 expire 1564291112 last 1564291035 Jul 27 22:22:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2de0b757-9b79-a15e-4447-cea1268e488d (at 10.9.104.15@o2ib4) Jul 27 22:22:28 fir-md1-s1 kernel: Lustre: Skipped 2598 previous similar messages Jul 27 22:24:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 22:24:05 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 27 22:24:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 22:24:33 fir-md1-s1 kernel: Lustre: Skipped 837 previous similar messages Jul 27 22:27:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 27 22:27:22 fir-md1-s1 kernel: Lustre: Skipped 1747 previous similar messages Jul 27 22:32:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 27 22:32:46 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 27 22:34:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 22:34:34 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 27 22:35:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 22:35:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 27 22:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 22:37:52 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 27 22:43:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 22:43:11 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 27 22:44:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 22:44:38 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 27 22:44:50 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 27 22:44:50 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 36 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21416:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=1, delay=7621 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21416:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 22427:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.114.7@o2ib4: deadline 6:2s ago req@ffff8f112229fc50 x1631634093582496/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564292832 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 18782:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0d88737450 x1639244063830816/t0(0) o3->d958ad69-3bbc-9cba-9027-0e7e6ffc5069@10.9.115.8@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564292832 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 22427:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 33 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 18782:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 37 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f112229fc50 x1631634093582496/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564292832 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 20476:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f3fd6ce3900 x1638797982651744/t0(0) o35->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 20476:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 18 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 20244:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564292826/real 0] req@ffff8f3fd6ce7200 x1636747965675440/t0(0) o13->fir-OST0011-osc-MDT0000@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564292833 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: fir-OST0011-osc-MDT0000: Connection to fir-OST0011 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 46563:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f16c447c850 x1638087961433120/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 46563:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c222600 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0618702400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dd1b48e00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2506c7e400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f147b56be00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1e81d87800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0618706200 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3df66e8000 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10eae58200 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c174ca000 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f170632e200 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e81d87600 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3668a95000 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c222600 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f147b568c00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 21743:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3e3991f050 x1638931616694656/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1389a99800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ba1a52400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1389a9bc00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f124abe8200 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1389a9d800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0bc6f23a00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66ede00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f077125f800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10eae58200 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1706329800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f147b569800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f124abe8a00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66e9400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dd1b49000 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0bad78e600 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f147b56d400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4364bb9800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66e9e00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f170632bc00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e81d80c00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0bad78ee00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f73a9ea00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f43ddb61a00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4314ac1c00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223c223a00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f07faf0dc00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f052c699400 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0e74b90600 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f052c699800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 21294:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2c43c55050 x1638931616695632/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564292846 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0bc6f21c00 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f73a9f600 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dd1b49800 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0618702400 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6), client will retry: rc = -110 Jul 27 22:47:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e81d82400 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21996:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=23 reqQ=0 recA=11, svcEst=20, delay=7663 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21996:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 5 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21996:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f40667f9450 x1638871878150688/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21996:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 55 previous similar messages Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f2908cf5a00 x1631610433600944/t0(0) o37->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:12/0 lens 448/408 e 0 to 0 dl 1564292832 ref 1 fl Complete:/0/0 rc -110/-110 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6), client will retry: rc = -110 Jul 27 22:47:15 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 71 previous similar messages Jul 27 22:47:17 fir-md1-s1 kernel: LustreError: 24572:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f27dfe1dc50 x1635100953608000/t0(0) o4->d3f5a0da-73c5-66d2-9b6e-32f6ac286de2@10.9.104.69@o2ib4:1/0 lens 488/448 e 1 to 0 dl 1564292851 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:17 fir-md1-s1 kernel: LustreError: 24572:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 10 previous similar messages Jul 27 22:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d3f5a0da-73c5-66d2-9b6e-32f6ac286de2 (at 10.9.104.69@o2ib4), client will retry: rc = -110 Jul 27 22:47:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 22:47:19 fir-md1-s1 kernel: LustreError: 46571:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+7s req@ffff8f119a793850 x1639300033514176/t0(0) o3->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b9b7d443-6e99-c10b-4d68-3e3fa30c5530 (at 10.9.113.5@o2ib4), client will retry: rc -110 Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: 22434:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f376d7d7050 x1631634093581936/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:47:19 fir-md1-s1 kernel: Lustre: 21534:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f3e3991f450 x1639512133427280/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564292832 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:47:19 fir-md1-s1 kernel: LustreError: 21038:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2ea1d90050 x1631567890384704/t0(0) o4->dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0@10.9.101.62@o2ib4:14/0 lens 488/448 e 0 to 0 dl 1564292864 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:19 fir-md1-s1 kernel: LustreError: 21038:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 27 22:47:19 fir-md1-s1 kernel: LustreError: 46571:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 45 previous similar messages Jul 27 22:47:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2dd7454a-4666-cb77-2a9b-10ada81c5a76 (at 10.8.18.27@o2ib6), client will retry: rc = -110 Jul 27 22:47:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 27 22:47:21 fir-md1-s1 kernel: Lustre: 46512:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27e9661c50 x1631634093581184/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564292846 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:22 fir-md1-s1 kernel: Lustre: 29831:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f28e4c71050 x1634178198097392/t0(0) o4->4eb33ecd-a5f0-193d-5f26-5af6c5e43062@10.9.109.68@o2ib4:27/0 lens 488/448 e 1 to 0 dl 1564292847 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:22 fir-md1-s1 kernel: Lustre: 29831:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 27 22:47:26 fir-md1-s1 kernel: Lustre: 21038:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f28e4c72050 x1631547360738544/t0(0) o4->945dc408-181f-0944-b51b-da16ad8b5610@10.9.107.40@o2ib4:1/0 lens 488/448 e 1 to 0 dl 1564292851 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:26 fir-md1-s1 kernel: Lustre: 21038:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 27 22:47:26 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f27c43e5c50 x1638087961433376/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564292846 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 27 22:47:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 27 22:47:26 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 27 22:47:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6f1cee30-1c66-44e2-2cfd-ee4c5e5568e6 (at 10.8.2.34@o2ib6), client will retry: rc = -110 Jul 27 22:47:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 22:47:31 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1722870050 x1637107922922784/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564292856 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:31 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 27 22:47:31 fir-md1-s1 kernel: LustreError: 46522:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f27a8739850 x1633737435551440/t0(0) o4->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:4/0 lens 488/448 e 1 to 0 dl 1564292854 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:31 fir-md1-s1 kernel: LustreError: 46522:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 9 previous similar messages Jul 27 22:47:34 fir-md1-s1 kernel: LustreError: 21742:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f40667f8450 x1638937134949072/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564292854 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4), client will retry: rc -110 Jul 27 22:47:34 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 22:47:34 fir-md1-s1 kernel: LustreError: 21742:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 18 previous similar messages Jul 27 22:47:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 22:47:39 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 22:47:39 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1ff23da450 x1638252020736176/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1564292864 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:39 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Jul 27 22:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 9cc377d2-4caf-4b76-44e8-690a24d9f29f (at 10.9.107.19@o2ib4), client will retry: rc = -110 Jul 27 22:47:40 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 27 22:47:40 fir-md1-s1 kernel: Lustre: 21498:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f28e4c76450 x1635348278737136/t0(0) o4->9cc377d2-4caf-4b76-44e8-690a24d9f29f@10.9.107.19@o2ib4:8/0 lens 488/448 e 0 to 0 dl 1564292858 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 22:47:40 fir-md1-s1 kernel: Lustre: 21498:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 22:47:51 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.28@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f23e419b180/0x5d9ee68f2cbfe433 lrc: 3/0,0 mode: PR/PR res: [0x200029cda:0x1db8:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.9.104.28@o2ib4 remote: 0xb1c962f220aefe02 expref: 15264 pid: 97644 timeout: 3407931 lvb_type: 0 Jul 27 22:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5752787f-9b8e-b81a-8dcf-5d80c1148661 (at 10.8.1.4@o2ib6) reconnecting Jul 27 22:47:52 fir-md1-s1 kernel: Lustre: Skipped 838 previous similar messages Jul 27 22:47:56 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3976707800 x1638868774411472/t0(0) o101->02c26bf3-fa17-a1a8-99bf-7d6ba53ad75c@10.9.106.7@o2ib4:1/0 lens 576/3264 e 0 to 0 dl 1564292881 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:47:56 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Jul 27 22:48:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 37s: evicting client at 10.9.107.40@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3734185a00/0x5d9ee68f2c354e8f lrc: 3/0,0 mode: PR/PR res: [0x2c002bdde:0xc00c:0x0].0x0 bits 0x13/0x0 rrc: 511 type: IBT flags: 0x60200400000020 nid: 10.9.107.40@o2ib4 remote: 0x420aebec7392fcc7 expref: 3684 pid: 10586 timeout: 3407936 lvb_type: 0 Jul 27 22:48:05 fir-md1-s1 kernel: LustreError: 26254:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f241ad48300 x1636747965748032/t0(0) o104->fir-MDT0002@10.9.109.68@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 27 22:48:05 fir-md1-s1 kernel: LustreError: 26254:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 27 22:53:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 22:53:15 fir-md1-s1 kernel: Lustre: Skipped 1616 previous similar messages Jul 27 22:54:13 fir-md1-s1 kernel: Lustre: 13921:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0d40040c50 x1638087973648672/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564293258 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 22:54:13 fir-md1-s1 kernel: Lustre: 13921:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 26 previous similar messages Jul 27 22:54:28 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0d40040c50 x1638087973648672/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564293258 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 22:54:28 fir-md1-s1 kernel: LustreError: 57787:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 2 previous similar messages Jul 27 22:54:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -107 Jul 27 22:54:28 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 27 22:54:28 fir-md1-s1 kernel: Lustre: 57787:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:10s); client may timeout. req@ffff8f0d40040c50 x1638087973648672/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564293258 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 27 22:54:28 fir-md1-s1 kernel: Lustre: 57787:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 22:55:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 27 22:55:40 fir-md1-s1 kernel: Lustre: Skipped 488 previous similar messages Jul 27 22:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 27 22:58:19 fir-md1-s1 kernel: Lustre: Skipped 276 previous similar messages Jul 27 23:01:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 23:01:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 23:03:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 23:03:26 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 27 23:05:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 23:05:43 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 27 23:08:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 23:08:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 27 23:12:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 23:12:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 23:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 27 23:13:27 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 27 23:17:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 23:17:27 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 27 23:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 23:18:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 35232:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=11 reqQ=0 recA=18, svcEst=1, delay=6845 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 35232:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0d0fab9c50 x1638908140847744/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 488/0 e 0 to 0 dl 1564294890 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 21906:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f275d216c00 x1639244113541968/t0(0) o37->d958ad69-3bbc-9cba-9027-0e7e6ffc5069@10.9.115.8@o2ib4:0/0 lens 448/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 35232:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 21906:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 35239:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.8@o2ib4: deadline 6:1s ago req@ffff8f0ceb61e050 x1634531595412032/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:0/0 lens 488/0 e 0 to 0 dl 1564294890 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 35239:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 22157:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f1ad2ee3050 x1638887318846608/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:29/0 lens 488/408 e 0 to 0 dl 1564294889 ref 2 fl Complete:/0/0 rc 131072/131072 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 22157:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 23565:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564294884/real 0] req@ffff8f0b10635400 x1636747981426656/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564294891 ref 3 fl Rpc:X/0/ffffffff rc 0/-1 Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (7): c: 5, oc: 0, rc: 8 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 23565:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 26 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f1ad2ee5850 x1638832492892032/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1564294889 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 23 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 1080 seconds Jul 27 23:21:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 27 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f223c223800 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1e061bc800 Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 45 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f394c604200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f394c607400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f07fc747200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1165a4a800 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2fd4749800 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f20f520d600 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2fd474cc00 Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20506:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.11.15@o2ib6 from 10.0.10.51@o2ib7 Jul 27 23:21:32 fir-md1-s1 kernel: LNetError: 20506:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 34 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f33d2b7ae00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21539:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f33d2b79600 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46521:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2b1628c400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20505:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f43f6f98a00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46523:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2fd474e200 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d958ad69-3bbc-9cba-9027-0e7e6ffc5069 (at 10.9.115.8@o2ib4), client will retry: rc -110 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21793:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f348424cc00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21039:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2b1628a000 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46516:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2fd474a400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21450:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2de86d9200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21292:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f33d2b7ee00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 25997:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3df66eb000 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20f520aa00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21708:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3df66ee200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2de86dc200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33d2b7e000 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21716:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f20f520aa00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 21294:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1165a4f600 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46534:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1e061be000 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 22058:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f43f6f99200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 46536:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f17c4acbe00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17c4ace800 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 24563:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1b0be10600 Jul 27 23:21:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.52@o2ib7: connected Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66ea400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2de86d8000 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e061bb200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2de86dc400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1165a4e800 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17c4acb600 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ea4274200 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e061bd400 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1c83a18c00 Jul 27 23:21:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f394c604a00 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 10502:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=1, svcEst=20, delay=6388 Jul 27 23:21:32 fir-md1-s1 kernel: Lustre: 10502:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 9 previous similar messages Jul 27 23:21:33 fir-md1-s1 kernel: LustreError: 22973:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f31e6478850 x1631589781589600/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564294914 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:33 fir-md1-s1 kernel: LustreError: 22973:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 27 23:21:35 fir-md1-s1 kernel: LustreError: 46517:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f31e647a050 x1638887959553568/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564294914 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:35 fir-md1-s1 kernel: LustreError: 46517:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 27 23:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 27 23:21:35 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 27 23:21:36 fir-md1-s1 kernel: LustreError: 71842:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+6s req@ffff8f0b10633000 x1637986378424080/t0(0) o37->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:0/0 lens 448/440 e 0 to 0 dl 1564294890 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:36 fir-md1-s1 kernel: Lustre: 13960:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f08a4783850 x1638908140846736/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564294890 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 23:21:36 fir-md1-s1 kernel: Lustre: 13960:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 52 previous similar messages Jul 27 23:21:36 fir-md1-s1 kernel: LustreError: 71842:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 23 previous similar messages Jul 27 23:21:38 fir-md1-s1 kernel: LustreError: 21737:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f28c1867050 x1638868450900640/t0(0) o3->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564294914 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:38 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564294891/real 1564294891] req@ffff8f18dc2ece00 x1636747981426640/t0(0) o106->fir-MDT0000@10.8.26.34@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564294898 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 27 23:21:38 fir-md1-s1 kernel: Lustre: 21455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 27 23:21:38 fir-md1-s1 kernel: LustreError: 21737:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 27 23:21:38 fir-md1-s1 kernel: Lustre: 38767:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3816e9b450 x1638908140846192/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564294903 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:43 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f365c686850 x1638871906697152/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564294903 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:43 fir-md1-s1 kernel: LustreError: 66902:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f4054bef050 x1639512226751856/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564294903 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:43 fir-md1-s1 kernel: LustreError: 21514:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 27 23:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 27 23:21:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 27 23:21:46 fir-md1-s1 kernel: Lustre: 14790:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f08a4785050 x1638832492892464/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564294911 ref 2 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:46 fir-md1-s1 kernel: Lustre: 14790:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Jul 27 23:21:54 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f08a4780850 x1638087996347984/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:1/0 lens 488/440 e 0 to 0 dl 1564294921 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:55 fir-md1-s1 kernel: Lustre: 24069:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f081a6ea450 x1638763420939696/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564294914 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 27 23:21:55 fir-md1-s1 kernel: Lustre: 24069:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 27 23:21:59 fir-md1-s1 kernel: LustreError: 21735:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+6s req@ffff8f2971734c50 x1637107969871984/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564294913 ref 1 fl Interpret:/0/0 rc 0/0 Jul 27 23:21:59 fir-md1-s1 kernel: LustreError: 21735:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 68 previous similar messages Jul 27 23:22:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 27 23:22:01 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 27 23:23:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 23:23:44 fir-md1-s1 kernel: Lustre: Skipped 1394 previous similar messages Jul 27 23:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 23:28:54 fir-md1-s1 kernel: Lustre: Skipped 941 previous similar messages Jul 27 23:29:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 23:29:23 fir-md1-s1 kernel: Lustre: Skipped 470 previous similar messages Jul 27 23:31:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 23:31:32 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 27 23:33:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 27 23:33:45 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 27 23:39:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 27 23:39:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 27 23:39:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 23:39:49 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 27 23:43:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 23:43:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 27 23:43:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 23:43:55 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 27 23:49:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 27 23:49:51 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 27 23:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 27 23:50:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 27 23:54:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 27 23:54:01 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 27 23:56:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 27 23:56:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 00:00:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 00:00:31 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 28 00:00:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 00:00:33 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 28 00:04:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 00:04:27 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 00:07:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 00:07:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 00:10:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 00:10:36 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 28 00:10:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 00:10:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 28 00:14:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 00:14:27 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 00:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 00:18:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 00:20:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 00:20:42 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 28 00:21:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 00:21:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 28 00:24:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 00:24:33 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 28 00:30:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 00:30:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 00:31:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 00:31:09 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 28 00:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 00:31:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 00:34:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 00:34:34 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 28 00:40:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 00:40:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 00:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 00:41:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 00:42:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 00:42:52 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 28 00:44:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 00:44:46 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 28 00:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 00:53:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 00:54:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 00:54:05 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 28 00:55:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 00:55:35 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 28 00:58:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 00:58:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 01:05:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 01:05:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 01:05:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 01:05:40 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 01:05:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45354d2800, cur 1564301147 expire 1564300997 last 1564300920 Jul 28 01:06:13 fir-md1-s1 kernel: Lustre: 47044:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f39f9bc0300 x1634937021864640/t0(0) o103->a2d1cfa6-4e2d-7226-3700-dc24c44c8e97@10.9.108.16@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 4 previous similar messages Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 2, rc: 8 Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 4 previous similar messages Jul 28 01:06:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds Jul 28 01:06:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 9 previous similar messages Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 46573:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.1.35@o2ib6 from 10.0.10.51@o2ib7 Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 46573:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 22 previous similar messages Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 24569:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2b8e534e00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 46555:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f38dcf62400 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 48193:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f267ce35a00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 46523:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f267ce36400 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 6547:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f267ce30800 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 46552:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2345d80800 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 21388:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3cba71f200 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 21792:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3cba71c000 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 21036:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f4089be6e00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 48194:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3cba718e00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 21538:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3cba71fc00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 21498:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2f36b31e00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 24570:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0564f00e00 Jul 28 01:06:13 fir-md1-s1 kernel: LustreError: 46518:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0564f00400 Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 01:06:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Jul 28 01:06:13 fir-md1-s1 kernel: Lustre: 47044:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 13 previous similar messages Jul 28 01:06:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:06:14 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 01:06:15 fir-md1-s1 kernel: LustreError: 48194:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f29eb2e2850 x1634122341477792/t0(0) o4->b37c54be-7fed-724b-d760-c5bd71b2a4e0@10.8.29.5@o2ib6:3/0 lens 488/448 e 1 to 0 dl 1564301193 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with b37c54be-7fed-724b-d760-c5bd71b2a4e0 (at 10.8.29.5@o2ib6), client will retry: rc = -110 Jul 28 01:06:15 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 01:06:15 fir-md1-s1 kernel: LustreError: 48194:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 28 01:06:17 fir-md1-s1 kernel: LustreError: 24570:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2ee83ab850 x1631610506084832/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:3/0 lens 488/440 e 1 to 0 dl 1564301193 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6), client will retry: rc -110 Jul 28 01:06:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 01:06:21 fir-md1-s1 kernel: LustreError: 21036:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f282ffdb050 x1631305428238128/t0(0) o4->92b6a633-30cd-12b4-adc7-75b3b4fa1ab8@10.8.10.17@o2ib6:3/0 lens 488/448 e 1 to 0 dl 1564301193 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:21 fir-md1-s1 kernel: LustreError: 21036:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 28 01:06:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 92b6a633-30cd-12b4-adc7-75b3b4fa1ab8 (at 10.8.10.17@o2ib6), client will retry: rc = -110 Jul 28 01:06:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 01:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 28 01:06:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 01:06:28 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f31e64a9850 x1631581839756016/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:3/0 lens 488/440 e 1 to 0 dl 1564301193 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:28 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 66 previous similar messages Jul 28 01:06:33 fir-md1-s1 kernel: LustreError: 6547:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f326a4b7450 x1635714193436384/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:3/0 lens 488/440 e 1 to 0 dl 1564301193 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:33 fir-md1-s1 kernel: LustreError: 46523:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f31e64a9850 x1631581839756016/t0(0) o3->3d29c3e1-3431-278f-589f-781a7b3c90ae@10.8.16.6@o2ib6:3/0 lens 488/440 e 1 to 0 dl 1564301193 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:06:33 fir-md1-s1 kernel: LustreError: 46523:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 28 01:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6), client will retry: rc -110 Jul 28 01:06:33 fir-md1-s1 kernel: LustreError: 6547:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 28 01:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 37322032-bc61-10be-2bac-d4651ae05719 (at 10.8.20.16@o2ib6), client will retry: rc = -110 Jul 28 01:06:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 01:07:21 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 28 01:10:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 01:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 01:15:30 fir-md1-s1 kernel: Lustre: Skipped 222 previous similar messages Jul 28 01:15:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 01:15:41 fir-md1-s1 kernel: Lustre: Skipped 324 previous similar messages Jul 28 01:16:06 fir-md1-s1 kernel: Lustre: 35231:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f060ad4c450 x1639512514341920/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564301771 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 01:16:06 fir-md1-s1 kernel: Lustre: 35231:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 01:16:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:16:20 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 28 01:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 253b4c5b-4ff4-3bf5-58fe-413737b1d5c2 (at 10.8.20.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4504faa400, cur 1564301788 expire 1564301638 last 1564301561 Jul 28 01:23:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 01:23:46 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 28 01:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 01:25:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 01:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 01:25:49 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 28 01:27:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:27:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 01:36:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 01:36:10 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 01:36:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 01:36:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 01:37:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 01:37:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 01:37:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:37:56 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 28 01:46:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 01:46:14 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 28 01:46:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 01:46:25 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 01:48:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:48:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 01:48:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 01:48:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 28 01:49:19 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0d11e56c50 x1638826260023552/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564303764 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 01:49:33 fir-md1-s1 kernel: LustreError: 21709:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0d11e56c50 x1638826260023552/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564303764 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 01:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7baec68-f8c8-0730-9508-ba1e77698953 (at 10.9.114.6@o2ib4), client will retry: rc -107 Jul 28 01:49:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 01:49:33 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:9s); client may timeout. req@ffff8f0d11e56c50 x1638826260023552/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564303764 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 01:49:33 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 28 01:56:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 01:56:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 01:57:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 01:57:34 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 28 01:58:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 01:58:06 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 01:59:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 01:59:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 02:01:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f70011800, cur 1564304513 expire 1564304363 last 1564304286 Jul 28 02:01:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 02:06:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 02:06:31 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 02:08:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 02:08:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 02:08:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 02:08:29 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 02:12:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 02:16:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 02:16:52 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 28 02:18:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 02:18:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 02:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 02:19:26 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 02:24:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 02:24:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 02:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 02:26:53 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 28 02:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 02:29:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 02:30:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 02:30:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 28 02:34:48 fir-md1-s1 kernel: Lustre: 13921:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f063f8f2850 x1638871630746416/t0(0) o3->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564306493 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 02:34:57 fir-md1-s1 kernel: LustreError: 49249:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f063f8f2850 x1638871630746416/t0(0) o3->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564306493 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 02:34:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e3c32682-5f6c-0001-d03b-79e797f51faf (at 10.9.115.5@o2ib4), client will retry: rc -107 Jul 28 02:34:57 fir-md1-s1 kernel: Lustre: 49249:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f063f8f2850 x1638871630746416/t0(0) o3->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564306493 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 02:36:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 02:36:58 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 02:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 02:39:45 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 28 02:40:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 02:40:18 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 02:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 02:47:00 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 28 02:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 02:49:46 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 02:50:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 02:50:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 02:50:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 02:50:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 28 02:57:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 02:57:03 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 28 02:58:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 03:00:11 fir-md1-s1 kernel: Lustre: Skipped 110413 previous similar messages Jul 28 03:00:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 03:00:19 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 03:04:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:04:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 03:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 03:07:10 fir-md1-s1 kernel: Lustre: Skipped 110455 previous similar messages Jul 28 03:10:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 03:10:26 fir-md1-s1 kernel: Lustre: Skipped 52398 previous similar messages Jul 28 03:10:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 03:10:38 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 03:14:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:17:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 03:17:13 fir-md1-s1 kernel: Lustre: Skipped 52442 previous similar messages Jul 28 03:20:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 03:20:32 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 28 03:24:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 03:24:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 28 03:26:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:26:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 03:28:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 03:28:14 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 28 03:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 03:30:46 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 28 03:32:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 28 03:32:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (5): c: 2, oc: 0, rc: 7 Jul 28 03:32:36 fir-md1-s1 kernel: LustreError: 21043:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f3118ff9050 x1638888293959632/t0(0) o3->11f7dba6-7171-5836-2062-1974c5637c6a@10.8.28.11@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564309956 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:36 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 03:32:36 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0957e7f200 Jul 28 03:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c94251200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f360ce62a00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d62dc00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0f52fc4200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f07598fb200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f389dfeae00 Jul 28 03:32:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 0 seconds Jul 28 03:32:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f082e66de00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e189ef200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e189e9800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f19e696aa00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44b86ec200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3fdae3f600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f062243ce00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f38dcf61200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0957e7ac00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a63f6a00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0c7a636e00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d62ee00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f25347df600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f133ac38a00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a63f4800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e189ee800 Jul 28 03:32:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.209@o2ib7: accepting Jul 28 03:32:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 2 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f348424ae00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f33d74d4e00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f33d74d0000 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f21bafb2000 Jul 28 03:32:37 fir-md1-s1 kernel: LNetError: 16186:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.9.108.55@o2ib4 from 10.0.10.51@o2ib7 Jul 28 03:32:37 fir-md1-s1 kernel: LNetError: 16186:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 288 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c94252600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f300e9de800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b15c1e200 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d628000 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d62d000 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d628600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350c7ca400 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0b15c19800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1864192000 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f44ea24d800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f39a3f29c00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f348424d800 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f389dfeba00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3df66ed000 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 21043:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 28 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 11f7dba6-7171-5836-2062-1974c5637c6a (at 10.8.28.11@o2ib6), client will retry: rc -110 Jul 28 03:32:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f33d74d1200 Jul 28 03:32:37 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 03:32:37 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 20 previous similar messages Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2888dbf600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0957e7b600 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f21bafb2a00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f19e6968e00 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 22649:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2312fc8050 x1638901757557744/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564309971 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:37 fir-md1-s1 kernel: LustreError: 22649:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 28 03:32:38 fir-md1-s1 kernel: LustreError: 22156:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2043032450 x1638252503592240/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564309986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:38 fir-md1-s1 kernel: LustreError: 46536:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2043032c50 x1638901757559728/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564309986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:38 fir-md1-s1 kernel: LustreError: 46536:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Jul 28 03:32:38 fir-md1-s1 kernel: LustreError: 13961:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0734fa2050 x1638798322132112/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:8/0 lens 488/440 e 0 to 0 dl 1564309988 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -107 Jul 28 03:32:38 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 28 03:32:38 fir-md1-s1 kernel: LustreError: 13961:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 1 previous similar message Jul 28 03:32:40 fir-md1-s1 kernel: LustreError: 21040:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2ea9b18850 x1633754865857632/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1564309971 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 28 03:32:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 28 03:32:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6), client will retry: rc = -110 Jul 28 03:32:40 fir-md1-s1 kernel: LustreError: 21040:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 49 previous similar messages Jul 28 03:32:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f242de6b-d808-0917-2b3c-5d74c45834f3 (at 10.8.26.2@o2ib6), client will retry: rc = -110 Jul 28 03:32:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 28 03:32:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 53803d2a-ea9e-0335-702a-3d9daed0d916 (at 10.8.22.17@o2ib6), client will retry: rc = -110 Jul 28 03:32:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 03:32:44 fir-md1-s1 kernel: LustreError: 35232:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f14f04fd850 x1638932173766560/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564309971 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:44 fir-md1-s1 kernel: LustreError: 35232:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 20 previous similar messages Jul 28 03:32:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 28 03:32:44 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 03:32:45 fir-md1-s1 kernel: Lustre: 24567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34e6396050 x1634980490118864/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:20/0 lens 488/440 e 1 to 0 dl 1564309970 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:46 fir-md1-s1 kernel: Lustre: 20505:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2dc473d450 x1638722993063152/t0(0) o3->cc7042ec-251b-fe73-dc93-9545d29323f6@10.8.27.21@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1564309971 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:50 fir-md1-s1 kernel: LustreError: 49471:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f34e6396050 x1634980490118864/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:20/0 lens 488/440 e 1 to 0 dl 1564309970 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:51 fir-md1-s1 kernel: Lustre: 20506:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f33c1475c50 x1631547897475376/t0(0) o4->aec69d6f-8b9d-1fe2-74fb-aa6ac6ee7bb1@10.9.106.63@o2ib4:26/0 lens 504/448 e 1 to 0 dl 1564309976 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:51 fir-md1-s1 kernel: Lustre: 20506:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Jul 28 03:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a890aaf8-05fd-cdba-39fc-201f06d6890d (at 10.9.108.55@o2ib4), client will retry: rc = -110 Jul 28 03:32:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 03:32:51 fir-md1-s1 kernel: LustreError: 20504:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 15+0s req@ffff8f23ba6b4850 x1631590008068960/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1564309971 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:51 fir-md1-s1 kernel: LustreError: 20504:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 28 03:32:56 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2043030050 x1631458662122928/t0(0) o3->bb0a5132-bb89-b076-3d1c-a0a716c38321@10.8.12.3@o2ib6:26/0 lens 488/440 e 1 to 0 dl 1564309976 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:32:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 62c3a024-34de-fd61-6956-bb3675e9d145 (at 10.8.1.13@o2ib6), client will retry: rc -110 Jul 28 03:32:56 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 03:32:56 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 28 03:33:01 fir-md1-s1 kernel: Lustre: 21881:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2b9a206c00 x1633662632627696/t0(0) o37->60a9f157-4802-e53d-dccf-19f0d690f2d1@10.9.0.1@o2ib4:6/0 lens 448/440 e 0 to 0 dl 1564309986 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:33:01 fir-md1-s1 kernel: Lustre: 21881:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 03:33:05 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2043034c50 x1635091968709024/t0(0) o4->c56b34d4-3ae6-19a5-6d19-cc66577d2e25@10.9.102.17@o2ib4:6/0 lens 504/448 e 0 to 0 dl 1564309986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:33:05 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 9 previous similar messages Jul 28 03:33:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c56b34d4-3ae6-19a5-6d19-cc66577d2e25 (at 10.9.102.17@o2ib4), client will retry: rc = -110 Jul 28 03:33:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 03:33:06 fir-md1-s1 kernel: LustreError: 46592:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f23f3a90050 x1638691855176448/t0(0) o3->ac4e42b8-5648-2511-97b0-70a975af15db@10.8.30.18@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564309986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 03:33:06 fir-md1-s1 kernel: LustreError: 46592:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 28 03:33:09 fir-md1-s1 kernel: Lustre: 46591:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f225d3f3850 x1637118825328096/t0(0) o3->03459ba8-d420-8fa0-2983-fdf11ef807a0@10.8.7.4@o2ib6:7/0 lens 488/440 e 0 to 0 dl 1564309987 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 03:34:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.12@o2ib6, removing former export from same NID Jul 28 03:34:59 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 28 03:38:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 03:38:32 fir-md1-s1 kernel: Lustre: Skipped 388 previous similar messages Jul 28 03:40:41 fir-md1-s1 kernel: Lustre: 20463:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f266130e050 x1631648400263664/t0(0) o101->8719d679-2033-f46d-d5b4-1da7ad753964@10.8.21.33@o2ib6:16/0 lens 576/3264 e 0 to 0 dl 1564310446 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 03:40:41 fir-md1-s1 kernel: Lustre: 20463:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 28 03:40:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 03:40:46 fir-md1-s1 kernel: Lustre: Skipped 26652 previous similar messages Jul 28 03:41:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:41:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 03:45:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 03:45:05 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 03:48:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 03:48:36 fir-md1-s1 kernel: Lustre: Skipped 76914 previous similar messages Jul 28 03:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 03:50:52 fir-md1-s1 kernel: Lustre: Skipped 50536 previous similar messages Jul 28 03:53:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 03:53:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 03:55:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 03:55:29 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 28 03:58:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 03:58:55 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 28 04:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 04:01:07 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 04:03:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 04:03:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 04:05:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 04:05:31 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 04:08:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 04:08:57 fir-md1-s1 kernel: Lustre: Skipped 29708 previous similar messages Jul 28 04:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 04:11:21 fir-md1-s1 kernel: Lustre: Skipped 29676 previous similar messages Jul 28 04:14:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 04:14:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 21900:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=1, delay=8471 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 21900:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f317b939b00 x1639156891649424/t0(0) o35->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:26/0 lens 392/456 e 0 to 0 dl 1564312496 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 21900:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 56 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f146fd99e00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f146fd9da00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f146fd98a00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f146fd9c000 Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 3 seconds Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 1 Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 3 seconds Jul 28 04:15:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 39 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3c94255a00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f162c7a3000 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f38ec70fa00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2e8bca8e00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0bad788c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0f52fc6800 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dcfe05200 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dcfe07000 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f28305bd600 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f28305baa00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dcfe07200 Jul 28 04:15:00 fir-md1-s1 kernel: LNetError: 21041:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.5@o2ib6 from 10.0.10.51@o2ib7 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 21041:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1b0be11400 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 48197:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1b0be11c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0f52fc5e00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 46577:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3347fee600 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff8f22c6153900 x1638888061894880/t435089645807(0) o101->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:26/0 lens 1768/1192 e 0 to 0 dl 1564312496 ref 2 fl Complete:/0/0 rc 0/0 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20504:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3347fe8400 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 27583:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2ff13ff600 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 49467:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f440fe3f400 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 23106:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3050715400 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 24568:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3050717c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 27587:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3347fe9200 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 49463:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f1864192600 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1dcfe06000 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1dcfe02000 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1dcfe00200 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 49464:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3050716c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f146fd9de00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f146fd9d800 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f146fd9c600 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0bad789a00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3df66eea00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3c94256800 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3df66ec200 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f082e669c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3347fe9c00 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f146fd9b000 Jul 28 04:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f146fd9b600 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 20203:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564312500/real 1564312500] req@ffff8f0ec67f8600 x1636748094198640/t0(0) o13->fir-OST001d-osc-MDT0000@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564312507 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 28 04:15:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.212@o2ib7: connected Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: fir-OST001d-osc-MDT0000: Connection to fir-OST001d (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=4 reqQ=0 recA=0, svcEst=10, delay=8471 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 6 previous similar messages Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1a8da06c00 x1633754884883968/t435089645864(0) o101->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:26/0 lens 1768/1192 e 0 to 0 dl 1564312496 ref 1 fl Complete:/0/0 rc 0/0 Jul 28 04:15:00 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Jul 28 04:15:01 fir-md1-s1 kernel: LustreError: 20507:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34d389f450 x1631590062538224/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 28 04:15:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 28 04:15:01 fir-md1-s1 kernel: LustreError: 20507:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 28 04:15:15 fir-md1-s1 kernel: Lustre: 46514:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d2abec850 x1639194080193984/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564312520 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:16 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f27a1578450 x1640016184762240/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 28 04:15:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 04:15:16 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 28 04:15:16 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3137211850 x1638884459076848/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:16 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Jul 28 04:15:21 fir-md1-s1 kernel: LustreError: 21284:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 21+0s req@ffff8f438ccdbc50 x1638088518473392/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e3c32682-5f6c-0001-d03b-79e797f51faf (at 10.9.115.5@o2ib4), client will retry: rc -110 Jul 28 04:15:21 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 28 04:15:21 fir-md1-s1 kernel: LustreError: 21284:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 70 previous similar messages Jul 28 04:15:22 fir-md1-s1 kernel: Lustre: 20506:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f304bada050 x1638792943305376/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:15:22 fir-md1-s1 kernel: Lustre: 20506:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 16 previous similar messages Jul 28 04:15:25 fir-md1-s1 kernel: Lustre: 25630:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1bdb26f850 x1638955006363360/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564312530 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:25 fir-md1-s1 kernel: Lustre: 25630:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 90 previous similar messages Jul 28 04:15:28 fir-md1-s1 kernel: LustreError: 46519:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+8s req@ffff8f2d2abec850 x1639194080193984/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564312520 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:28 fir-md1-s1 kernel: LustreError: 46519:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 28 04:15:28 fir-md1-s1 kernel: Lustre: 46519:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:8s); client may timeout. req@ffff8f2d2abec850 x1639194080193984/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564312520 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:15:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 28 04:15:29 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 28 04:15:30 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1bdb26d450 x1638937700793744/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564312530 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:30 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1bdb26f850 x1638955006363360/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564312530 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:15:30 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 42 previous similar messages Jul 28 04:15:31 fir-md1-s1 kernel: Lustre: 48194:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:10s); client may timeout. req@ffff8f27a157cc50 x1631610664469008/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564312521 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:15:31 fir-md1-s1 kernel: Lustre: 48194:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 46 previous similar messages Jul 28 04:15:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 04:15:38 fir-md1-s1 kernel: Lustre: Skipped 450 previous similar messages Jul 28 04:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 04:19:00 fir-md1-s1 kernel: Lustre: Skipped 2418 previous similar messages Jul 28 04:22:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 04:22:28 fir-md1-s1 kernel: Lustre: Skipped 1614 previous similar messages Jul 28 04:26:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 04:26:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 04:26:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 04:26:38 fir-md1-s1 kernel: Lustre: Skipped 364 previous similar messages Jul 28 04:29:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 04:29:02 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 28 04:32:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 04:32:28 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 28 04:36:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 04:36:47 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 04:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 04:39:04 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Jul 28 04:40:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 04:40:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 04:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 04:42:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 28 04:47:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 28 04:47:13 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 04:49:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 04:49:05 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=28 reqQ=0 recA=41, svcEst=9, delay=7948 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 25681:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f2f0953cb00 x1638932356515888/t0(0) o101->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:0/0 lens 600/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 21670:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f2f0953b600 x1638937776262112/t0(0) o101->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:0/0 lens 592/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 25681:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 21670:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f06853c7b00 x1637108333918656/t0(0) o101->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:24/0 lens 1512/3264 e 0 to 0 dl 1564314684 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 13960:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f13d268b050 x1638798485322256/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:23/0 lens 488/408 e 0 to 0 dl 1564314683 ref 2 fl Complete:/0/0 rc 131072/131072 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 25678:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.14@o2ib4: deadline 6:2s ago req@ffff8f2f0953c500 x1638883720358432/t0(0) o101->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:24/0 lens 584/0 e 0 to 0 dl 1564314684 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 25678:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 19 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 23619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564314678/real 1564314678] req@ffff8f32310cda00 x1636748102716784/t0(0) o106->fir-MDT0000@10.9.104.28@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564314685 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 23619:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f1abc080050 x1635247177166976/t0(0) o3->50589ff6-c33e-a1c3-e1ce-e27ed9cd0c25@10.9.101.48@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564314683 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2ff13f9200 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f32f3dfde00 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f440fe38400 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ff13fd800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1af2c97e00 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0002-osp-MDT0000: Connection to fir-MDT0002 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0d4afcfc00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0d4afca600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af1a00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af4200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424be00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f212af35c00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06b0739800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e8bcae000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06c0bc5200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38bd76c000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f93a79a00 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 3, rc: 7 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 656 seconds Jul 28 04:51:28 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 14 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f37e4fe3a00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f37e4fe4400 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34efd1ee00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34efd19600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3fdae3dc00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34efd1da00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 21545:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f13d2689850 x1639513046877616/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564314708 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f32f3dfae00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af6c00 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 24570:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.24.21@o2ib6 from 10.0.10.51@o2ib7 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 24570:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 26 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 24570:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f42a0ec5600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2fcc1ee200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06c0bc0800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 49465:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3c266fd600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af4e00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0d4afca800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2fcc1ed200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af2a00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0d4afcc400 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f42a0ec5200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f36216a6c00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f36216a7600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38bd76c000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 21736:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34f7e74200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 48196:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f31749f3200 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06b0739800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e80729c00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 21037:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2802410e00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f38dcf65000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f212af31400 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34efd19000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f37e4fe3c00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f37e4fe5800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f440fe39800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f06c0bc7600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1af2c95a00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 46528:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34f7e75c00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 21735:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2f36b35e00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af5400 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af5600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f42eccba600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1af2c96600 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3fdae3fc00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34efd1ea00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3fdae3c000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3fdae39000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f12ff63a000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f39a63f2000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af2800 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3df66eac00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1dd9caa000 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15e8af5e00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f06b073de00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1276ad1e00 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 29832:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2e61846050 x1635692303800336/t0(0) o4->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:18/0 lens 488/448 e 0 to 0 dl 1564314708 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc = -110 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 49477:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f42a0ec5e00 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 23619:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.7.8@o2ib6 from 10.0.10.51@o2ib7 Jul 28 04:51:28 fir-md1-s1 kernel: LNetError: 23619:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 276 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 22432:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2b23bd0c50 x1635692303800304/t0(0) o4->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:18/0 lens 488/448 e 0 to 0 dl 1564314708 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:28 fir-md1-s1 kernel: LustreError: 22432:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 21894:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f142ab7a700 x1636469136766304/t0(0) o37->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:24/0 lens 448/408 e 0 to 0 dl 1564314684 ref 1 fl Complete:/0/0 rc -110/-110 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 21894:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 53 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=27, svcEst=20, delay=5823 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 4 previous similar messages Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3db5f50c50 x1638932356515920/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:25/0 lens 488/440 e 0 to 0 dl 1564314685 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:28 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 60 previous similar messages Jul 28 04:51:30 fir-md1-s1 kernel: LustreError: 79335:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f432e8fac50 x1638872292681664/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:30 fir-md1-s1 kernel: LustreError: 21390:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+7s req@ffff8f21b7f6f450 x1638955070327248/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564314683 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:30 fir-md1-s1 kernel: LustreError: 21390:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 37 previous similar messages Jul 28 04:51:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 28 04:51:30 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 04:51:30 fir-md1-s1 kernel: Lustre: 21390:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f21b7f6f450 x1638955070327248/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564314683 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b4e5ca84-188b-ba39-2c33-303c95b2d77a (at 10.8.11.24@o2ib6), client will retry: rc = -110 Jul 28 04:51:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 04:51:30 fir-md1-s1 kernel: LustreError: 79335:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 22 previous similar messages Jul 28 04:51:32 fir-md1-s1 kernel: Lustre: 24572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e61842c50 x1639194167622544/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:32 fir-md1-s1 kernel: Lustre: 24572:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 04:51:32 fir-md1-s1 kernel: Lustre: 79335:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f43b3ba9050 x1638932356514560/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:32 fir-md1-s1 kernel: Lustre: 79335:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 28 04:51:33 fir-md1-s1 kernel: LustreError: 46528:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2ed233e450 x1631681815814288/t0(0) o4->a501b92b-e7b6-1a0d-e95a-8363a690f102@10.8.11.28@o2ib6:16/0 lens 504/448 e 1 to 0 dl 1564314706 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:33 fir-md1-s1 kernel: LustreError: 46528:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 28 04:51:33 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f364eb99450 x1638798485323200/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564314698 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:33 fir-md1-s1 kernel: Lustre: 49479:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Jul 28 04:51:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b9b7d443-6e99-c10b-4d68-3e3fa30c5530 (at 10.9.113.5@o2ib4), client will retry: rc -110 Jul 28 04:51:34 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 28 04:51:36 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e61846450 x1638783918613056/t0(0) o3->64cd7216-d693-ed6b-ee4d-6e372402c9ad@10.8.27.6@o2ib6:11/0 lens 488/440 e 1 to 0 dl 1564314701 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:36 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 28 04:51:37 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f27b76e7050 x1631635010399808/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:37 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 28 04:51:38 fir-md1-s1 kernel: Lustre: 49474:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f2611423850 x1638793031955648/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:38 fir-md1-s1 kernel: Lustre: 21792:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f26a5662050 x1640016238029888/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:38 fir-md1-s1 kernel: Lustre: 21792:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jul 28 04:51:41 fir-md1-s1 kernel: LustreError: 49466:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2e61846450 x1638783918613056/t0(0) o3->64cd7216-d693-ed6b-ee4d-6e372402c9ad@10.8.27.6@o2ib6:11/0 lens 488/440 e 1 to 0 dl 1564314701 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:41 fir-md1-s1 kernel: LustreError: 49466:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 32 previous similar messages Jul 28 04:51:42 fir-md1-s1 kernel: Lustre: 49230:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f32f6bb0450 x1634532087788576/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564314707 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: 46584:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f432e8fd450 x1638955070327184/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564314697 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: 46584:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 28 04:51:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 04:51:49 fir-md1-s1 kernel: LustreError: 46553:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1647c23850 x1634519833029248/t0(0) o4->eaf995be-0d27-b013-5e90-e619713af34c@10.8.13.6@o2ib6:18/0 lens 504/448 e 0 to 0 dl 1564314708 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:49 fir-md1-s1 kernel: LustreError: 21736:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 23+0s req@ffff8f26a5660c50 x1634924122637840/t0(0) o4->8c55cb94-7e98-7ab0-0640-ed020030cf15@10.8.30.21@o2ib6:19/0 lens 504/448 e 0 to 0 dl 1564314709 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:49 fir-md1-s1 kernel: LustreError: 21736:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 39 previous similar messages Jul 28 04:51:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8c55cb94-7e98-7ab0-0640-ed020030cf15 (at 10.8.30.21@o2ib6), client will retry: rc = -110 Jul 28 04:51:49 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 28 04:51:49 fir-md1-s1 kernel: LustreError: 46553:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 28 04:51:51 fir-md1-s1 kernel: Lustre: 21535:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2c8331e050 x1631458725740768/t0(0) o3->bb0a5132-bb89-b076-3d1c-a0a716c38321@10.8.12.3@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1564314716 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:51:51 fir-md1-s1 kernel: Lustre: 21535:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 53 previous similar messages Jul 28 04:51:52 fir-md1-s1 kernel: Lustre: 20505:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:5s); client may timeout. req@ffff8f27b76e4450 x1640016238029952/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564314707 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 04:51:52 fir-md1-s1 kernel: Lustre: 20505:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Jul 28 04:52:02 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 44s: evicting client at 10.9.101.15@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1b9f547980/0x5d9ee690df000d7b lrc: 3/0,0 mode: PW/PW res: [0x200029f07:0x62d2:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x60200400000020 nid: 10.9.101.15@o2ib4 remote: 0x1bcae83b365bb7dc expref: 7714 pid: 97672 timeout: 3429782 lvb_type: 0 Jul 28 04:52:02 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 9 previous similar messages Jul 28 04:52:16 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:53s); client may timeout. req@ffff8f0f88228f00 x1634147214170192/t0(0) o101->cc4008f6-fb0a-3a63-7de5-6cb4e06911a9@10.9.101.44@o2ib4:23/0 lens 480/536 e 0 to 0 dl 1564314683 ref 1 fl Complete:/0/0 rc 0/0 Jul 28 04:52:16 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Jul 28 04:52:22 fir-md1-s1 kernel: LustreError: 23757:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f17e7784400 ns: mdt-fir-MDT0000_UUID lock: ffff8f0d2736c5c0/0x5d9ee690df01dcc4 lrc: 3/0,0 mode: PW/PW res: [0x200029f07:0x62d2:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x50200400000020 nid: 10.9.101.15@o2ib4 remote: 0x1bcae83b365bb80d expref: 2 pid: 23757 timeout: 0 lvb_type: 0 Jul 28 04:52:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a890aaf8-05fd-cdba-39fc-201f06d6890d (at 10.9.108.55@o2ib4) reconnecting Jul 28 04:52:30 fir-md1-s1 kernel: Lustre: Skipped 1447 previous similar messages Jul 28 04:52:41 fir-md1-s1 kernel: Lustre: 26257:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f24b6e4c800 x1631701444905728/t0(0) o101->be419174-dade-262e-7149-5b03d1650211@10.9.101.36@o2ib4:16/0 lens 480/568 e 0 to 0 dl 1564314766 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 04:52:41 fir-md1-s1 kernel: Lustre: 26257:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 28 04:53:46 fir-md1-s1 kernel: LustreError: 21447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564314736, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1c514b0000/0x5d9ee690df647693 lrc: 3/0,1 mode: --/PW res: [0x200029f07:0x62d2:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21447 timeout: 0 lvb_type: 0 Jul 28 04:53:48 fir-md1-s1 kernel: LustreError: 23580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564314738, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0af7223600/0x5d9ee690df6d0642 lrc: 3/0,1 mode: --/PW res: [0x200029f07:0x62d2:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23580 timeout: 0 lvb_type: 0 Jul 28 04:53:52 fir-md1-s1 kernel: LustreError: 97640:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564314742, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f22904dde80/0x5d9ee690df87b40b lrc: 3/0,1 mode: --/PW res: [0x200029f07:0x62d2:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97640 timeout: 0 lvb_type: 0 Jul 28 04:53:52 fir-md1-s1 kernel: LustreError: 97640:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jul 28 04:57:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 04:57:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 04:58:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 04:58:13 fir-md1-s1 kernel: Lustre: Skipped 672 previous similar messages Jul 28 04:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 04:59:21 fir-md1-s1 kernel: Lustre: Skipped 2137 previous similar messages Jul 28 05:02:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 05:02:33 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 28 05:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 05:08:52 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 28 05:09:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 05:09:23 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 28 05:09:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 05:09:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 05:12:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 05:12:46 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 28 05:18:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 05:18:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 28 05:19:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 05:19:44 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 28 05:19:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3edb11b7-1b4e-5651-a19b-5b0b407d9019 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d6d7db800, cur 1564316399 expire 1564316249 last 1564316172 Jul 28 05:20:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f514cc7a-9bbf-6a9c-dfda-7e21d4d17fbe (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d1689d000, cur 1564316410 expire 1564316260 last 1564316183 Jul 28 05:20:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 05:23:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 05:23:23 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 05:26:02 fir-md1-s1 kernel: Lustre: 23738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f27c0eebf00 x1632261162946704/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:7/0 lens 376/1600 e 0 to 0 dl 1564316767 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 05:26:02 fir-md1-s1 kernel: Lustre: 23738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 28 05:28:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 05:28:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 05:29:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 05:29:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 05:29:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 05:29:48 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 05:33:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 05:33:37 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 05:33:47 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 28 05:33:47 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 32 previous similar messages Jul 28 05:39:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 05:39:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 05:39:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 28 05:39:25 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 05:39:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 05:39:51 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 28 05:43:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 05:43:55 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 28 05:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 05:50:03 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 28 05:51:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 28 05:51:02 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 05:54:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 05:54:46 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 28 05:56:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 05:56:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 06:00:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 06:00:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 28 06:02:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:02:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 28 06:04:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 06:04:59 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 06:10:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 06:10:41 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 28 06:11:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 06:11:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 06:12:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:12:36 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 06:15:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 06:15:13 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 28 06:21:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 06:21:21 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 06:22:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 06:22:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 06:22:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:22:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 06:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 06:26:11 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 28 06:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 06:31:27 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 28 06:32:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:32:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 06:33:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 06:36:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 06:36:11 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 06:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 06:41:58 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 28 06:43:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:43:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 06:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 06:46:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 06:52:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 06:52:07 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 28 06:53:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 06:53:22 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 06:56:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 06:56:43 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 07:02:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 07:02:53 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 28 07:04:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:04:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 07:04:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:04:09 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 28 07:05:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:05:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 07:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 07:06:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 07:08:16 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 28 07:13:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 07:13:07 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 28 07:14:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:14:36 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 28 07:17:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a2b25ba8-28ed-6323-9536-30692c0dfb2e (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f147a472c00, cur 1564323445 expire 1564323295 last 1564323218 Jul 28 07:17:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 07:17:25 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 07:20:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:21:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 07:23:21 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 07:26:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:26:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 07:27:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 07:27:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 07:27:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 07:32:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:33:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:33:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 07:33:22 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 28 07:33:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:37:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 07:37:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 07:37:44 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 07:39:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:39:22 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 28 07:43:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 07:43:26 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 07:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 07:48:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 07:49:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:49:24 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 07:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 07:54:15 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 07:58:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 07:58:45 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 07:59:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 07:59:58 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 08:00:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:04:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:04:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 08:04:26 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 28 08:08:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 08:08:56 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 08:13:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 08:13:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 28 08:14:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 08:14:28 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 28 08:15:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:16:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:16:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 08:19:08 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 28 08:19:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:22:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:24:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 08:24:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 08:25:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 08:25:57 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 28 08:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 08:29:17 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 08:32:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 28 08:32:51 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f33e4702450 x1638902312721456/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564327971 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:32:51 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Jul 28 08:32:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 7 Jul 28 08:32:51 fir-md1-s1 kernel: LNetError: 20207:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 08:32:51 fir-md1-s1 kernel: Lustre: 20239:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564327971/real 1564327971] req@ffff8f426b2ca700 x1636748154810000/t0(0) o13->fir-OST001c-osc-MDT0002@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564327978 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 28 08:32:51 fir-md1-s1 kernel: Lustre: 20239:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 28 08:32:51 fir-md1-s1 kernel: Lustre: fir-OST001c-osc-MDT0002: Connection to fir-OST001c (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 28 08:32:51 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3235621400 Jul 28 08:32:51 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22f3761400 Jul 28 08:32:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4), client will retry: rc -110 Jul 28 08:32:52 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 28 08:32:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.105@o2ib7: 614 seconds Jul 28 08:32:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 33 previous similar messages Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e8bcac000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34f7e73200 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f34f7e72400 Jul 28 08:32:52 fir-md1-s1 kernel: LNetError: 46562:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.15.8@o2ib6 from 10.0.10.51@o2ib7 Jul 28 08:32:52 fir-md1-s1 kernel: LNetError: 46562:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 9 previous similar messages Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f366cca8000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0724a51400 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 49465:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f308a54a600 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 46521:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f39a3f40c00 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20506:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f38dcf66600 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 21389:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2318c1ca00 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 44039:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2529f22000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f299d3d7200 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a3f47000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f19452a0400 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f19452a2c00 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2529f24400 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f38dcf63600 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2e8bcae000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1345a89000 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f315bcdd800 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3235627400 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3235623400 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e189e8c00 Jul 28 08:32:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1e061bba00 Jul 28 08:32:52 fir-md1-s1 kernel: LNetError: 20207:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Jul 28 08:32:53 fir-md1-s1 kernel: LustreError: 21365:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1b53497450 x1636581126050128/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564327985 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:32:53 fir-md1-s1 kernel: LustreError: 21365:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 28 08:32:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 28 08:32:54 fir-md1-s1 kernel: LustreError: 38767:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3f8e670450 x1639513570249312/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:16/0 lens 488/440 e 0 to 0 dl 1564327996 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:32:54 fir-md1-s1 kernel: LustreError: 38767:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 27 previous similar messages Jul 28 08:32:54 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 28 08:32:56 fir-md1-s1 kernel: LustreError: 44040:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1b53492050 x1638833263738800/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564327986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:32:56 fir-md1-s1 kernel: LustreError: 44040:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 60 previous similar messages Jul 28 08:32:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6), client will retry: rc -110 Jul 28 08:32:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 08:33:00 fir-md1-s1 kernel: LustreError: 46533:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1c05472450 x1633755073278352/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:16/0 lens 488/440 e 0 to 0 dl 1564327996 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:00 fir-md1-s1 kernel: LustreError: 46533:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 15 previous similar messages Jul 28 08:33:00 fir-md1-s1 kernel: Lustre: 35236:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c16331050 x1638793518836128/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564327985 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:01 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f430e653050 x1638799009344000/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564327986 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:01 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 08:33:06 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+0s req@ffff8f22a2bf4c50 x1638793518836496/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564327985 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:06 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 14 previous similar messages Jul 28 08:33:06 fir-md1-s1 kernel: Lustre: 21543:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f22a2bf4c50 x1638793518836496/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564327985 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 08:33:06 fir-md1-s1 kernel: Lustre: 21543:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 28 08:33:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ca693efe-e963-3124-a59d-0beac55f4de3 (at 10.9.112.17@o2ib4), client will retry: rc -110 Jul 28 08:33:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 28 08:33:08 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f28da77c450 x1638764287707408/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564327986 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:08 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 18 previous similar messages Jul 28 08:33:10 fir-md1-s1 kernel: Lustre: 46515:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f275132a450 x1631610889729488/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1564327995 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:10 fir-md1-s1 kernel: Lustre: 46515:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jul 28 08:33:16 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 24+0s req@ffff8f22c4ba4850 x1631610889729504/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1564327995 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:16 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 13 previous similar messages Jul 28 08:33:16 fir-md1-s1 kernel: Lustre: 21453:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f22c4ba4850 x1631610889729504/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1564327995 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 08:33:16 fir-md1-s1 kernel: Lustre: 21453:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Jul 28 08:33:16 fir-md1-s1 kernel: Lustre: 21389:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20af7d2850 x1638719566599104/t0(0) o4->ceadc533-2b20-ce35-943b-e716e933f51a@10.8.23.15@o2ib6:21/0 lens 504/448 e 0 to 0 dl 1564328001 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:33:16 fir-md1-s1 kernel: Lustre: 21389:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jul 28 08:33:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 83887939-6757-4aea-8b88-f0aa38eb91bc (at 10.9.108.13@o2ib4), client will retry: rc = -110 Jul 28 08:33:18 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 28 08:33:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4e9291ef-9090-67be-2550-94052940879c (at 10.9.102.72@o2ib4), client will retry: rc = -110 Jul 28 08:33:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 08:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with bec1b57c-73f8-9411-510b-0f3c3cf1422b (at 10.8.10.6@o2ib6), client will retry: rc = -110 Jul 28 08:33:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 08:34:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 08:34:40 fir-md1-s1 kernel: Lustre: Skipped 373 previous similar messages Jul 28 08:36:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 08:36:05 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 28 08:37:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:38:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:39:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 08:39:52 fir-md1-s1 kernel: Lustre: Skipped 284 previous similar messages Jul 28 08:45:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 08:45:04 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 28 08:45:49 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b4c4e050 x1634532507648016/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564328754 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 08:45:49 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 08:46:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 08:46:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 08:46:07 fir-md1-s1 kernel: LustreError: 20500:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -13+13s req@ffff8f06b4c4e050 x1634532507648016/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564328754 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:46:07 fir-md1-s1 kernel: LustreError: 20500:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 28 08:46:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bf0fab1f-ed86-800d-24d6-23f47310966d (at 10.9.113.8@o2ib4), client will retry: rc -110 Jul 28 08:46:07 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 28 08:46:07 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:13s); client may timeout. req@ffff8f06b4c4e050 x1634532507648016/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564328754 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 08:46:07 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 28 08:47:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:48:38 fir-md1-s1 kernel: LustreError: 21536:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f28cbc11450 x1634532509538032/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:16/0 lens 488/440 e 0 to 0 dl 1564328926 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 08:48:38 fir-md1-s1 kernel: LustreError: 21536:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 28 08:48:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bf0fab1f-ed86-800d-24d6-23f47310966d (at 10.9.113.8@o2ib4), client will retry: rc -110 Jul 28 08:50:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 08:50:00 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 08:54:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 08:55:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 08:55:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 28 08:56:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 08:56:15 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 09:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 09:00:05 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 09:05:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 09:05:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 09:07:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:09:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 09:09:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 28 09:10:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 09:10:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 28 09:11:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:15:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 09:15:17 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 28 09:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 09:20:29 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 09:21:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 09:21:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 28 09:22:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:24:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:24:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 09:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 09:25:49 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 09:30:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:31:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 09:31:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 09:31:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 09:31:22 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 09:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 09:35:51 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 28 09:36:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:37:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:41:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 28 09:41:33 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 09:41:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 09:41:41 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 09:45:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 09:45:57 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 28 09:51:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:51:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 09:52:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 09:52:02 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 28 09:52:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 09:52:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 09:55:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 09:56:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 09:56:15 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 28 09:56:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:02:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:02:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 10:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 10:02:57 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 10:04:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 10:04:01 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 28 10:05:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:05:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 10:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 10:06:22 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 28 10:12:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdfd6400, cur 1564333970 expire 1564333820 last 1564333743 Jul 28 10:12:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 10:13:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:13:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 10:13:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 10:13:18 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 10:14:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 10:14:02 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 10:16:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 10:16:36 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 10:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 10:23:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 10:24:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 28 10:24:40 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 28 10:26:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:26:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 10:26:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 10:26:42 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 22432:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.16.3@o2ib6: deadline 6:3s ago req@ffff8f35033eac50 x1631590529220448/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:27/0 lens 488/0 e 0 to 0 dl 1564335027 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 22432:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 22432:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f35033eac50 x1631590529220448/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:27/0 lens 488/0 e 0 to 0 dl 1564335027 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 46542:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=32 reqQ=0 recA=45, svcEst=1, delay=8304 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 46542:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f38cda13450 x1639194807495776/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564335027 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 46587:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 9s req@ffff8f1f5e043c50 x1638079843157008/t0(0) o3->f0a8fbb7-06c4-ed16-a94f-6cea310ceb29@10.8.0.82@o2ib6:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23569:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564335021/real 1564335021] req@ffff8f10b3638600 x1636748202820976/t0(0) o1000->fir-MDT0002-osp-MDT0000@0@lo:24/4 lens 304/4320 e 0 to 1 dl 1564335028 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23569:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0002-osp-MDT0000: Connection to fir-MDT0002 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 25633:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f210d32f050 x1631635913843344/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564335027 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 25633:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 3 seconds Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 2, oc: 0, rc: 5 Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.108@o2ib7: 3 seconds Jul 28 10:30:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 19 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325fa21400 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a40800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2dfe302a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2975db8400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f40ecac5400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2975db8c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325fa20200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e8bcab800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18a4216200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf67800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b1035da00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2707b83200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f275193a000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424dc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f090e261600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2975db9000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f325fa24000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f14e257ac00 Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 27604:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.12@o2ib6 from 10.0.10.51@o2ib7 Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 27604:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 12 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 69437:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3b1035de00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 22156:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2975db9000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 22670:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3235627800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f23fd5bbe00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 44038:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3b10359800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 49472:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f308a54fe00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f168b4cd200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f40ecac0600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0633dc7a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e8bcaee00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a45c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2751938600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58e000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3484248600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58d000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3484248000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325fa25a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a3f43000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f299d3d3e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424c600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c266fae00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22f3767a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b7faa5e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f83a9da00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f366cca9a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f325fa27600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf62c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f325fa22e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f168b4c8200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f275193cc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34f41bcc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2e8bcadc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f14e257f400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f230639f200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 46552:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f3b10359800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 21716:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2975db8c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 6549:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f275193cc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 49462:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2751938600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2e8bcaf200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2dfe303000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf61200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a40c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf63c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a44000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18a4211a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424be00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca589c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424e400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f14e257ee00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1178211400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34c63dea00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58d200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f23fd5bb400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2707b81600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f38dcf67200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f14e257ae00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f366ccaac00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a24ba00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b1035ea00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d62da00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2707b83200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a40e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a249a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34f41bae00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3f4d52d000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ec9b2cc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3f4d52d600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f40ecac2800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf60e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf60000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3b7faa5a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2751938c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2975dbc200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3b7faa4600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348424f800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2311a41c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3b7faa2600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2dfe304000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c266fba00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c266fde00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f83a9aa00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f83a9d400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b7faa4000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a24a600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf67800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22f3767000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18a4215200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f12961b4200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f312cfc5a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a249600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1178214e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f366ccae400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f230639da00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a41200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2311a46800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f37e4fe7c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0633dc2c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2751938000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18a4217000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0cfde72600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3f4d52aa00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58bc00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a3f47c00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f14e257f000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f312cfc0000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b7faa3e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a24ac00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21088e7e00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f83a98a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f42eccbf200 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c266fea00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58a000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2306398600 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3c266fd400 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39a3f42a00 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2270385800 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0dca58d000 Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a0a248000 Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 21679:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.1.18@o2ib6 from 10.0.10.51@o2ib7 Jul 28 10:30:32 fir-md1-s1 kernel: LNetError: 21679:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 12 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23106:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=64 reqQ=0 recA=28, svcEst=20, delay=8340 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23106:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 11 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23106:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f35033eac50 x1631590529220448/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:27/0 lens 488/0 e 0 to 0 dl 1564335027 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: 23106:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 129 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 46563:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1f5e044850 x1638869009497536/t0(0) o3->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 8df94149-5690-262d-f805-cc7898f99b40 (at 10.8.16.5@o2ib6), client will retry: rc -110 Jul 28 10:30:32 fir-md1-s1 kernel: Lustre: Skipped 117 previous similar messages Jul 28 10:30:32 fir-md1-s1 kernel: LustreError: 46563:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 28 10:30:35 fir-md1-s1 kernel: LustreError: 46511:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+8s req@ffff8f33df386c50 x1638723154796816/t0(0) o3->cc7042ec-251b-fe73-dc93-9545d29323f6@10.8.27.21@o2ib6:27/0 lens 488/440 e 0 to 0 dl 1564335027 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 28 10:30:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 28 10:30:35 fir-md1-s1 kernel: Lustre: 22431:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:8s); client may timeout. req@ffff8f37cdc18850 x1638799296505104/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564335027 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 10:30:35 fir-md1-s1 kernel: Lustre: 22431:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 179 previous similar messages Jul 28 10:30:35 fir-md1-s1 kernel: LustreError: 46511:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 119 previous similar messages Jul 28 10:30:37 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564335030/real 1564335030] req@ffff8f12ce941800 x1636748202820848/t0(0) o104->fir-MDT0002@10.8.17.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564335037 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 28 10:30:37 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:10s); client may timeout. req@ffff8f1ef9386f00 x1638281262727776/t0(0) o55->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:27/0 lens 472/192 e 0 to 0 dl 1564335027 ref 1 fl Complete:/0/0 rc -22/-22 Jul 28 10:30:37 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Jul 28 10:30:37 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jul 28 10:30:45 fir-md1-s1 kernel: Lustre: 22958:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2bca255050 x1631635913844432/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:20/0 lens 488/440 e 1 to 0 dl 1564335050 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:45 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2254bfa450 x1638884734644912/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 28 10:30:45 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 28 10:30:46 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f21807050 x1638888132382208/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:46 fir-md1-s1 kernel: Lustre: 24566:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Jul 28 10:30:48 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2009fb1e00 x1640213061637024/t0(0) o101->296d97ff-0de3-b3eb-25b6-28238cfb0a2e@10.8.9.8@o2ib6:23/0 lens 480/568 e 1 to 0 dl 1564335053 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:48 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Jul 28 10:30:50 fir-md1-s1 kernel: LustreError: 46542:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f38cda13050 x1638826909452192/t0(0) o3->f7baec68-f8c8-0730-9508-ba1e77698953@10.9.114.6@o2ib4:20/0 lens 488/440 e 1 to 0 dl 1564335050 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 28 10:30:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 28 10:30:50 fir-md1-s1 kernel: LustreError: 46542:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 28 10:30:51 fir-md1-s1 kernel: Lustre: 23607:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f07bd1d2d00 x1633737457823008/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:26/0 lens 480/568 e 1 to 0 dl 1564335056 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:51 fir-md1-s1 kernel: Lustre: 23607:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 10:30:52 fir-md1-s1 kernel: Lustre: 24569:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f2bca251850 x1634980838694800/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 10:30:52 fir-md1-s1 kernel: Lustre: 24569:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 28 10:30:52 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 21+1s req@ffff8f0e03481050 x1639157288330512/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:52 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 38 previous similar messages Jul 28 10:30:55 fir-md1-s1 kernel: Lustre: 27587:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2254bfb850 x1638909754369232/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564335060 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:59 fir-md1-s1 kernel: LustreError: 27482:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+8s req@ffff8f1f5e041050 x1638888589928432/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 10:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 28 10:30:59 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 10:30:59 fir-md1-s1 kernel: Lustre: 46567:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:8s); client may timeout. req@ffff8f210d32a050 x1635714895183552/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564335051 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 10:30:59 fir-md1-s1 kernel: Lustre: 46567:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 28 10:30:59 fir-md1-s1 kernel: LustreError: 27482:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 8 previous similar messages Jul 28 10:33:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 10:33:22 fir-md1-s1 kernel: Lustre: Skipped 654 previous similar messages Jul 28 10:34:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.8.3@o2ib6, removing former export from same NID Jul 28 10:34:40 fir-md1-s1 kernel: Lustre: Skipped 342 previous similar messages Jul 28 10:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 10:37:05 fir-md1-s1 kernel: Lustre: Skipped 1003 previous similar messages Jul 28 10:41:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 10:41:40 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 28 10:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 10:43:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 10:44:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 10:44:51 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 10:48:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 10:48:05 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 10:54:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 10:54:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 10:56:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 10:56:54 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 10:58:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 10:58:12 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 28 11:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 11:04:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 11:08:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 28 11:08:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 28 11:08:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 11:08:30 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 35239:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=14 reqQ=0 recA=48, svcEst=20, delay=8233 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 35239:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f080fb14050 x1638833483982000/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564337367 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 35239:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 63 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 46528:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.112.17@o2ib4: deadline 6:2s ago req@ffff8f28a1b3e850 x1638793864713552/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564337367 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 46528:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46528:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f28a1b3e850 x1638793864713552/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564337367 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 23575:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff8f06d27df200 x1638909848929632/t0(0) o101->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 608/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46528:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 23575:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 13 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 21485:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f0ece417450 x1639513943984864/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564337367 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 21485:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 9 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 11:09:30 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 97 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f43f9dcee00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2b16660e00 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4), client will retry: rc -110 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564337361/real 1564337369] req@ffff8f38acd65400 x1636748234648080/t0(0) o13->fir-OST000d-osc-MDT0002@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564337368 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: fir-OST000d-osc-MDT0002: Connection to fir-OST000d (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f43f9dcd800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2420884c00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b97fdda00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f366ccafe00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1228d1d800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f168b4cd800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a08200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0b200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f214594ea00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f393d62aa00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f366ccada00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e8bcac000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0d800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a08000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f308fa8c000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f230639c400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f366ccaaa00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1228d1ea00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2420883400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a09800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f23fd5bba00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f276c038400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f393d62b200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2707b83e00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3b1035a800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2420884000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34726fd400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e8072d800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0c200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2420884e00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0a000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f170632f000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1228d1c000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0b97fde200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f230639fc00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1706328400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0d400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f42eccbda00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f38cdbc1000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f088ce29a00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1f83a9c000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2420884400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f30d3c94a00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2751938a00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f275193c800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f308a54dc00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a08800 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f170632e600 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10e8588a00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3f4d52e000 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f308a548200 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e18a0e400 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1f83a9cc00 Jul 28 11:09:30 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1228d1b600 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46561:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=12 reqQ=0 recA=61, svcEst=20, delay=8189 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46561:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 11 previous similar messages Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46561:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2455bcfc50 x1638833483981904/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:27/0 lens 488/0 e 0 to 0 dl 1564337367 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:09:30 fir-md1-s1 kernel: Lustre: 46561:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Jul 28 11:09:31 fir-md1-s1 kernel: LustreError: 21365:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f21dead0850 x1635692817948592/t0(0) o3->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:20/0 lens 488/440 e 0 to 0 dl 1564337390 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc -110 Jul 28 11:09:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 28 11:09:31 fir-md1-s1 kernel: LustreError: 21365:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 17 previous similar messages Jul 28 11:09:35 fir-md1-s1 kernel: Lustre: 46548:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3f7ccb6850 x1638253281811392/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564337380 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:35 fir-md1-s1 kernel: Lustre: 46548:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 11:09:36 fir-md1-s1 kernel: LustreError: 27482:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f193b7a0050 x1639150234691200/t0(0) o3->ad5b8b9d-f149-444a-fb05-2479a0cbbcd5@10.8.15.10@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564337391 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ad5b8b9d-f149-444a-fb05-2479a0cbbcd5 (at 10.8.15.10@o2ib6), client will retry: rc -110 Jul 28 11:09:36 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 28 11:09:36 fir-md1-s1 kernel: Lustre: 21514:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4109eb1450 x1634135110282448/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564337381 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:36 fir-md1-s1 kernel: Lustre: 21514:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 28 11:09:37 fir-md1-s1 kernel: LustreError: 27605:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1e44ec3450 x1640016772940832/t0(0) o3->4be13f91-94ff-43a7-d4ac-0956b3c28c36@10.8.16.4@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564337391 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:37 fir-md1-s1 kernel: LustreError: 27605:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 9 previous similar messages Jul 28 11:09:40 fir-md1-s1 kernel: LustreError: 46543:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f420c557450 x1634135110282720/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564337380 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:40 fir-md1-s1 kernel: LustreError: 46543:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 26 previous similar messages Jul 28 11:09:42 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 12+1s req@ffff8f420c552050 x1638833483981824/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564337381 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:42 fir-md1-s1 kernel: LustreError: 22989:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Jul 28 11:09:42 fir-md1-s1 kernel: Lustre: 22989:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f420c552050 x1638833483981824/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564337381 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 11:09:42 fir-md1-s1 kernel: Lustre: 22989:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 47 previous similar messages Jul 28 11:09:44 fir-md1-s1 kernel: Lustre: 59377:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f080fb10450 x1638833483982112/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:19/0 lens 488/440 e 1 to 0 dl 1564337389 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:44 fir-md1-s1 kernel: Lustre: 59377:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 28 11:09:47 fir-md1-s1 kernel: LustreError: 55159:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+7s req@ffff8f425f64a850 x1639513943984128/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564337380 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 28 11:09:47 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 11:09:47 fir-md1-s1 kernel: Lustre: 46539:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:7s); client may timeout. req@ffff8f3f7ccb6850 x1638253281811392/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564337380 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 11:09:47 fir-md1-s1 kernel: LustreError: 55159:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 28 11:09:52 fir-md1-s1 kernel: Lustre: 49475:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f268495c050 x1638933110996192/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564337390 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 11:09:52 fir-md1-s1 kernel: Lustre: 49475:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 28 11:09:52 fir-md1-s1 kernel: LustreError: 21451:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f115e20c850 x1631547901184400/t0(0) o4->aec69d6f-8b9d-1fe2-74fb-aa6ac6ee7bb1@10.9.106.63@o2ib4:29/0 lens 504/448 e 0 to 0 dl 1564337399 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:52 fir-md1-s1 kernel: LustreError: 21451:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 7 previous similar messages Jul 28 11:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with aec69d6f-8b9d-1fe2-74fb-aa6ac6ee7bb1 (at 10.9.106.63@o2ib4), client will retry: rc = -110 Jul 28 11:09:54 fir-md1-s1 kernel: Lustre: 21987:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2909e59c50 x1638884666685040/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1564337399 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:54 fir-md1-s1 kernel: Lustre: 21987:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 148 previous similar messages Jul 28 11:09:56 fir-md1-s1 kernel: LustreError: 21996:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+6s req@ffff8f35e97ba850 x1638909848928656/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:20/0 lens 488/440 e 0 to 0 dl 1564337390 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:09:56 fir-md1-s1 kernel: LustreError: 21996:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 121 previous similar messages Jul 28 11:10:01 fir-md1-s1 kernel: Lustre: 46556:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f2909e59c50 x1638884666685040/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1564337399 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 11:10:01 fir-md1-s1 kernel: Lustre: 46556:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 26 previous similar messages Jul 28 11:13:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:13:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 11:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 11:14:49 fir-md1-s1 kernel: Lustre: Skipped 1956 previous similar messages Jul 28 11:15:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:18:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:18:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 11:18:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 11:18:49 fir-md1-s1 kernel: Lustre: Skipped 2988 previous similar messages Jul 28 11:20:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 11:20:06 fir-md1-s1 kernel: Lustre: Skipped 1019 previous similar messages Jul 28 11:25:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 11:25:11 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 28 11:26:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:26:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 11:28:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 11:28:50 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 28 11:30:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 11:30:46 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 28 11:35:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 11:35:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 11:39:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 11:39:06 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 11:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 11:41:26 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 28 11:45:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 11:45:37 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 49462:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=26 reqQ=0 recA=31, svcEst=20, delay=7089 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 49462:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 49473:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.1@o2ib4: deadline 6:1s ago req@ffff8f325554d850 x1639194924229296/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:23/0 lens 488/0 e 0 to 0 dl 1564339583 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 24564:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.114.15@o2ib4: deadline 6:1s ago req@ffff8f2909434850 x1638253354453168/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:23/0 lens 488/0 e 0 to 0 dl 1564339583 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20986:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f0e8fb46900 x1638909938570112/t0(0) o35->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 49462:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f26dd809c50 x1636581343412816/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:23/0 lens 488/0 e 0 to 0 dl 1564339583 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20464:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f31ee48da00 x1637108373602496/t0(0) o101->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:23/0 lens 592/3264 e 0 to 0 dl 1564339583 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 49473:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 14 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 24564:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 14 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20986:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 4 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 49462:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20464:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 46520:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f2dc8116c50 x1639194924229408/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:23/0 lens 488/0 e 0 to 0 dl 1564339583 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 22058:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f2594c3d050 x1638799454086176/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:23/0 lens 488/0 e 0 to 0 dl 1564339583 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 46520:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 22058:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 1, rc: 8 Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 24571:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f2dc8116450 x1631611040299648/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1564339582 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 24571:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 19 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 3 seconds Jul 28 11:46:25 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 17 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 18 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f43f9dcd800 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f391cdf2000 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f4229a8c200 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: Skipped 136 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 10143:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.29.2@o2ib6 from 10.0.10.51@o2ib7 Jul 28 11:46:25 fir-md1-s1 kernel: LNetError: 10143:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 2 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3cf3aad200 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f37e4fe0600 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f348424d000 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0808244e00 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3763660400 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f276c039600 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f203f5f6400 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3c266f8400 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f348424be00 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0808242a00 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f203f5f2c00 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20222:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564339584/real 1564339584] req@ffff8f2149447800 x1636748247202016/t0(0) o13->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564339591 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 20222:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0808247e00 Jul 28 11:46:25 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f203f5f1e00 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 23646:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=1, delay=6751 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 23646:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 8 previous similar messages Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 23646:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f420d88f200 x1635247421816896/t0(0) o36->50589ff6-c33e-a1c3-e1ce-e27ed9cd0c25@10.9.101.48@o2ib4:22/0 lens 512/2888 e 0 to 0 dl 1564339582 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:25 fir-md1-s1 kernel: Lustre: 23646:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 98 previous similar messages Jul 28 11:46:28 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+6s req@ffff8f35daebfc50 x1638909938569376/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:22/0 lens 488/440 e 0 to 0 dl 1564339582 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1277529-cbf1-b0b5-ff2d-5b114cf66536 (at 10.9.112.14@o2ib4), client will retry: rc -110 Jul 28 11:46:28 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 28 11:46:28 fir-md1-s1 kernel: Lustre: 38766:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f35daebd050 x1638884756138800/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:22/0 lens 488/440 e 0 to 0 dl 1564339582 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 11:46:28 fir-md1-s1 kernel: Lustre: 38766:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 88 previous similar messages Jul 28 11:46:28 fir-md1-s1 kernel: LustreError: 46586:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 65 previous similar messages Jul 28 11:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8a8b762b-3d50-8250-7301-05eab7cb4e19 (at 10.8.16.7@o2ib6), client will retry: rc = -110 Jul 28 11:46:39 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f092912c850 x1639514030739872/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564339604 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:39 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 28 11:46:41 fir-md1-s1 kernel: Lustre: 21542:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f212b6cd050 x1631590622396848/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:16/0 lens 488/440 e 0 to 0 dl 1564339606 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:44 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f342766b850 x1638884756138976/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564339604 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b9b7d443-6e99-c10b-4d68-3e3fa30c5530 (at 10.9.113.5@o2ib4), client will retry: rc -110 Jul 28 11:46:44 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 28 11:46:44 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 28 11:46:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.48@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f10495e4ec0/0x5d9ee69304c67e3f lrc: 3/0,0 mode: PR/PR res: [0x20000facb:0xb9ce:0x0].0x0 bits 0x5b/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.9.101.48@o2ib4 remote: 0x2eaeebf6affef550 expref: 6672 pid: 24584 timeout: 3454665 lvb_type: 0 Jul 28 11:46:46 fir-md1-s1 kernel: Lustre: 21970:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:23s); client may timeout. req@ffff8f420d88f200 x1635247421816896/t435449552222(0) o36->50589ff6-c33e-a1c3-e1ce-e27ed9cd0c25@10.9.101.48@o2ib4:22/0 lens 512/424 e 0 to 0 dl 1564339582 ref 1 fl Complete:/0/0 rc 0/0 Jul 28 11:46:46 fir-md1-s1 kernel: Lustre: 21970:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 28 11:46:47 fir-md1-s1 kernel: LustreError: 23106:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f26dd80e850 x1631630407521776/t0(0) o3->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564339607 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:48 fir-md1-s1 kernel: LustreError: 24572:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2dc8114850 x1633728244890592/t0(0) o3->c7b943f0-288f-0782-2e1c-59dfe4343697@10.8.7.22@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564339607 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:49 fir-md1-s1 kernel: Lustre: 24572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f34d6aa9c50 x1639245040235584/t0(0) o3->d958ad69-3bbc-9cba-9027-0e7e6ffc5069@10.9.115.8@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564339614 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:49 fir-md1-s1 kernel: Lustre: 24572:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Jul 28 11:46:53 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+6s req@ffff8f212b6ce450 x1633729269348544/t0(0) o4->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:17/0 lens 488/448 e 0 to 0 dl 1564339607 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:46:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Jul 28 11:46:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 28 11:46:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 11:46:53 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 21 previous similar messages Jul 28 11:49:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 11:49:12 fir-md1-s1 kernel: Lustre: Skipped 3766 previous similar messages Jul 28 11:51:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 11:51:29 fir-md1-s1 kernel: Lustre: Skipped 1299 previous similar messages Jul 28 11:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 11:56:20 fir-md1-s1 kernel: Lustre: Skipped 2474 previous similar messages Jul 28 11:56:31 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1846e43800 Jul 28 11:56:31 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2318c1aa00 Jul 28 11:56:31 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f23fd5bdc00 Jul 28 11:56:33 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f250167dc50 x1639194950709808/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564340206 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:56:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4), client will retry: rc -110 Jul 28 11:56:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 11:56:33 fir-md1-s1 kernel: LustreError: 46560:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 28 11:56:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 7e720460-379f-79e9-c8f9-d298654a333f (at 10.9.106.29@o2ib4), client will retry: rc = -110 Jul 28 11:56:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 11:56:41 fir-md1-s1 kernel: Lustre: 35233:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3ec90b3450 x1638872784312880/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564340206 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 11:56:41 fir-md1-s1 kernel: Lustre: 35233:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 11:56:42 fir-md1-s1 kernel: LustreError: 59211:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0610c42850 x1638872618703024/t0(0) o3->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:1/0 lens 488/440 e 0 to 0 dl 1564340221 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:56:42 fir-md1-s1 kernel: LustreError: 59211:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 33 previous similar messages Jul 28 11:56:46 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 15+0s req@ffff8f1d6ceaa450 x1638833534373680/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564340206 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:56:46 fir-md1-s1 kernel: LustreError: 46573:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 28 11:56:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with dd055160-73e1-c0f8-3c11-ca5351f1fd45 (at 10.9.105.71@o2ib4), client will retry: rc = -110 Jul 28 11:56:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 11:56:51 fir-md1-s1 kernel: LustreError: 35241:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0bf5528c50 x1639245047024160/t0(0) o3->d958ad69-3bbc-9cba-9027-0e7e6ffc5069@10.9.115.8@o2ib4:1/0 lens 488/440 e 0 to 0 dl 1564340221 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 11:56:51 fir-md1-s1 kernel: LustreError: 35241:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 26 previous similar messages Jul 28 11:58:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 11:58:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 11:59:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 11:59:23 fir-md1-s1 kernel: Lustre: Skipped 487 previous similar messages Jul 28 12:01:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 12:01:30 fir-md1-s1 kernel: Lustre: Skipped 150 previous similar messages Jul 28 12:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 12:06:39 fir-md1-s1 kernel: Lustre: Skipped 317 previous similar messages Jul 28 12:08:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 12:08:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 12:09:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 12:09:54 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 28 12:12:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 28 12:12:28 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 28 12:13:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26e5cdb000, cur 1564341236 expire 1564341086 last 1564341009 Jul 28 12:17:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 12:17:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 12:19:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 12:19:55 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 28 12:20:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 12:20:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=5 reqQ=0 recA=5, svcEst=1, delay=6216 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f06a9c63300 x1638096984731040/t0(0) o101->ae9081b9-15b7-d037-713b-67343872796f@10.9.104.27@o2ib4:12/0 lens 576/3264 e 0 to 0 dl 1564341762 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 25631:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f2307156450 x1638799552030048/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 25631:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 12 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 14789:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.4@o2ib4: deadline 6:1s ago req@ffff8f0766d06c50 x1638872803629168/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564341762 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 14789:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 20 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 14789:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f0766d06c50 x1638872803629168/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564341762 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 14789:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 28 12:22:44 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (0): c: 4, oc: 0, rc: 7 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 48193:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f2acb391850 x1634135245753632/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564341762 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 48193:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 8 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23574:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564341756/real 1564341756] req@ffff8f11d482f800 x1636748255929008/t0(0) o106->fir-MDT0002@10.9.109.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564341763 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 23574:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3bd2334e00 Jul 28 12:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 93 seconds Jul 28 12:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 24 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f170632f200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2c1f812400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34efd1e200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12961b7200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f276c03e200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3e12b82400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2707b86c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f276c039400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f12961b3800 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f914310c-7825-8c6a-2b04-354707ee5046 (at 10.9.113.3@o2ib4), client will retry: rc -110 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3cf3aaec00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f337a78b400 Jul 28 12:22:44 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 12:22:44 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1228d1f800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2d8e402e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3cf3aabc00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2c1f811800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f337a789e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3141d52a00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f1decca7800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1394ca3200 Jul 28 12:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.212@o2ib7: accepting Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f09c62f0200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d8e401c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f230639ca00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2707b80e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f18a4210c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1228d1c800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f3cf3aaca00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f38dcf66a00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f43f9dcbc00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f07a5735c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f230639a800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3cd7651800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2270381000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3cd7654200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1034834200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10222c8e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d8e407400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2c1f815e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f00429800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f337a78ac00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3766f6c400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3bd2331e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2707b86c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19452a0a00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ebce5d800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34efd1f000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3766f6b200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f0042a000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3766f6a200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e12b84c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f170632b600 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f170632aa00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f11a1019000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1394ca6000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19452a4c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3cf3aae000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f203f5f0800 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2318c1a600 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f230639f400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f088ce2d400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f088ce2e400 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f088ce28c00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2306398a00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1034836e00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f305a7ba200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f337a78e600 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3484248a00 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f276c03c000 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f0042f200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ddaa19200 Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ddaa1d800 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.114.7@o2ib4, removing former export from same NID Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2acb396850 x1631611055086064/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564341786 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=21 reqQ=0 recA=23, svcEst=1, delay=6614 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 14 previous similar messages Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0766d06c50 x1638872803629168/t0(0) o3->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:12/0 lens 488/0 e 0 to 0 dl 1564341762 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 12:22:44 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 91 previous similar messages Jul 28 12:22:47 fir-md1-s1 kernel: Lustre: 46585:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f40ec006450 x1638884832810784/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564341762 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 12:22:47 fir-md1-s1 kernel: LustreError: 21284:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+5s req@ffff8f39a83a8450 x1638955938642656/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:12/0 lens 488/440 e 0 to 0 dl 1564341762 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:47 fir-md1-s1 kernel: LustreError: 21284:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 64 previous similar messages Jul 28 12:22:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -110 Jul 28 12:22:47 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 28 12:22:47 fir-md1-s1 kernel: Lustre: 46585:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 106 previous similar messages Jul 28 12:22:49 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1d41790050 x1638884784069792/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:6/0 lens 488/440 e 0 to 0 dl 1564341786 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:49 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 28 12:22:50 fir-md1-s1 kernel: Lustre: 23574:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564341763/real 1564341763] req@ffff8f11d482f800 x1636748255929008/t0(0) o106->fir-MDT0002@10.9.109.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564341770 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 28 12:22:50 fir-md1-s1 kernel: Lustre: 21312:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:8s); client may timeout. req@ffff8f0734f0c200 x1639983138564304/t0(0) o101->3dbcc330-2c63-35af-d90b-dcca1dd83d4b@10.9.105.68@o2ib4:12/0 lens 480/536 e 0 to 0 dl 1564341762 ref 1 fl Complete:/0/0 rc 301/301 Jul 28 12:22:50 fir-md1-s1 kernel: Lustre: 21312:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 28 12:22:50 fir-md1-s1 kernel: Lustre: 23574:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 28 12:22:58 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a14fab600 x1636450139288464/t0(0) o101->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:3/0 lens 480/568 e 1 to 0 dl 1564341783 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:22:58 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 28 12:23:01 fir-md1-s1 kernel: Lustre: 20499:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1325cd5850 x1638938525361296/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564341786 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:01 fir-md1-s1 kernel: Lustre: 20499:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 28 12:23:03 fir-md1-s1 kernel: LustreError: 21038:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f33a8871850 x1639157407054352/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564341783 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d958ad69-3bbc-9cba-9027-0e7e6ffc5069 (at 10.9.115.8@o2ib4), client will retry: rc -110 Jul 28 12:23:03 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 28 12:23:03 fir-md1-s1 kernel: LustreError: 21038:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Jul 28 12:23:07 fir-md1-s1 kernel: Lustre: 68193:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f40ec007c50 x1638833572535872/t0(0) o3->0d11f504-1c11-cd97-b8af-49b86c52b9a6@10.9.112.6@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564341786 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 12:23:07 fir-md1-s1 kernel: Lustre: 68193:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 28 12:23:07 fir-md1-s1 kernel: LustreError: 46525:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f33a8871c50 x1638886066939872/t0(0) o3->534e10c9-e8b6-b009-609a-c6de708bb45f@10.8.27.35@o2ib6:13/0 lens 488/440 e 0 to 0 dl 1564341793 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:08 fir-md1-s1 kernel: Lustre: 46555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f33a8872050 x1638888505061936/t0(0) o3->11f7dba6-7171-5836-2062-1974c5637c6a@10.8.28.11@o2ib6:13/0 lens 488/440 e 0 to 0 dl 1564341793 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:08 fir-md1-s1 kernel: Lustre: 46555:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 132 previous similar messages Jul 28 12:23:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6eed6c6e-bf9d-6eed-41d9-2953d0976391 (at 10.9.101.4@o2ib4), client will retry: rc = -110 Jul 28 12:23:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 12:23:15 fir-md1-s1 kernel: Lustre: 21675:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3904d4b900 x1631670318554496/t0(0) o101->2027e649-8bcd-4ca1-6dcb-dd11dcd45e21@10.9.101.17@o2ib4:20/0 lens 480/568 e 0 to 0 dl 1564341800 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:15 fir-md1-s1 kernel: Lustre: 21675:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 12:23:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.9.102.46@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1ce6586c00/0x5d9ee69347125866 lrc: 3/0,0 mode: PR/PR res: [0x2c002be52:0xc626:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.102.46@o2ib4 remote: 0xc69d8121937184f8 expref: 1793 pid: 24585 timeout: 3456859 lvb_type: 0 Jul 28 12:23:20 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f148b1adc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e41aba400/0x5d9ee693471703c2 lrc: 3/0,0 mode: PW/PW res: [0x2c002be52:0xc626:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x50200000000000 nid: 10.9.102.46@o2ib4 remote: 0xc69d8121937185e6 expref: 608 pid: 97672 timeout: 0 lvb_type: 0 Jul 28 12:23:20 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:17s); client may timeout. req@ffff8f1a14fab600 x1636450139288464/t0(0) o101->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:3/0 lens 480/536 e 1 to 0 dl 1564341783 ref 1 fl Complete:/0/0 rc -107/-107 Jul 28 12:23:20 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Jul 28 12:23:39 fir-md1-s1 kernel: Lustre: 97652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f1f74bd3300 x1633886520093888/t0(0) o101->99661aab-9554-9a66-a9ba-0efac2d490ec@10.9.101.5@o2ib4:14/0 lens 480/568 e 0 to 0 dl 1564341824 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 12:23:46 fir-md1-s1 kernel: Lustre: 97664:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (22:2s); client may timeout. req@ffff8f1f74bd3300 x1633886520093888/t0(0) o101->99661aab-9554-9a66-a9ba-0efac2d490ec@10.9.101.5@o2ib4:14/0 lens 480/536 e 0 to 0 dl 1564341824 ref 1 fl Complete:/0/0 rc 0/0 Jul 28 12:27:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 12:27:46 fir-md1-s1 kernel: Lustre: Skipped 391 previous similar messages Jul 28 12:29:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 12:29:56 fir-md1-s1 kernel: Lustre: Skipped 529 previous similar messages Jul 28 12:32:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 28 12:32:54 fir-md1-s1 kernel: Lustre: Skipped 129 previous similar messages Jul 28 12:37:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 12:37:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 12:37:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 12:37:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 28 12:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 12:40:04 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 28 12:43:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 12:43:03 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 28 12:47:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 12:47:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 12:47:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 12:47:53 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 28 12:50:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 12:50:11 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 28 12:53:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 12:53:11 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 28 12:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 12:58:06 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 13:00:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 13:00:17 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 28 13:03:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 13:03:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 28 13:04:25 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0c49a27050 x1638888957338736/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:0/0 lens 488/440 e 1 to 0 dl 1564344270 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:04:25 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 28 13:04:33 fir-md1-s1 kernel: Lustre: 13960:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f0c49a27050 x1638888957338736/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:0/0 lens 488/408 e 1 to 0 dl 1564344270 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 28 13:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 13:08:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 13:10:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 13:10:21 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 28 13:10:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:10:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 13:14:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 13:14:28 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 28 13:18:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 13:18:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 13:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 13:20:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 28 13:25:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 13:25:42 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 13:28:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 13:28:33 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 28 13:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 13:30:37 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Jul 28 13:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fc769e800, cur 1564346138 expire 1564345988 last 1564345911 Jul 28 13:35:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 13:35:51 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=12 reqQ=0 recA=12, svcEst=1, delay=5523 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f38c5bb0c50 x1639514295805040/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564346265 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 21036:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.5@o2ib4: deadline 6:1s ago req@ffff8f2fde148450 x1639514295804848/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:15/0 lens 488/0 e 0 to 0 dl 1564346265 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 21036:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 22 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 27059:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f241333d400 x1638794180216208/t0(0) o35->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 27059:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 32 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: 21036:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f2fde148450 x1639514295804848/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:15/0 lens 488/0 e 0 to 0 dl 1564346265 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (7): c: 1, oc: 0, rc: 8 Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 28 13:37:46 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 43 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 44037:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f1df0335c50 x1635693011086448/t0(0) o3->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1564346265 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 44037:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 125 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds Jul 28 13:37:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 31 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f267fbe9600 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc -110 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 28 13:37:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.209@o2ib7: connected Jul 28 13:37:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Skipped 1 previous similar message Jul 28 13:37:46 fir-md1-s1 kernel: LustreError: 35241:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f147526dc50 x1638956080036016/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564346265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:46 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 28 13:37:47 fir-md1-s1 kernel: LustreError: 52249:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0f9c4f0050 x1639195128890432/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564346265 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:47 fir-md1-s1 kernel: Lustre: 52249:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2s); client may timeout. req@ffff8f0f9c4f0050 x1639195128890432/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564346265 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 13:37:47 fir-md1-s1 kernel: Lustre: 52249:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 70 previous similar messages Jul 28 13:37:47 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f17567d3850 x1633737460162192/t0(0) o4->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:6/0 lens 488/448 e 1 to 0 dl 1564346286 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:47 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 28 13:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Jul 28 13:37:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 13:37:50 fir-md1-s1 kernel: LustreError: 21038:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3252f40450 x1634084532352720/t0(0) o4->49aa8323-a38d-3237-508c-ea94c68aa863@10.9.108.53@o2ib4:6/0 lens 488/448 e 1 to 0 dl 1564346286 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 49aa8323-a38d-3237-508c-ea94c68aa863 (at 10.9.108.53@o2ib4), client will retry: rc = -110 Jul 28 13:37:53 fir-md1-s1 kernel: Lustre: 23750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564346266/real 1564346266] req@ffff8f2ce6ebec00 x1636748274298896/t0(0) o106->fir-MDT0000@10.9.101.27@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564346273 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 13:37:53 fir-md1-s1 kernel: Lustre: 23750:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 28 13:37:53 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2b783a3050 x1638956080042512/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1564346279 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:37:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -110 Jul 28 13:37:53 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 28 13:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d7baf791-4e64-c4d2-0126-7b11628a9a4c (at 10.8.12.26@o2ib6), client will retry: rc = -110 Jul 28 13:38:01 fir-md1-s1 kernel: Lustre: 21714:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1df0337850 x1638938617700544/t0(0) o3->294f669a-76d8-9cb4-d54f-e33a51dba159@10.9.112.11@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564346286 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:38:02 fir-md1-s1 kernel: Lustre: 21448:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1df0332c50 x1631630658025536/t0(0) o3->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564346287 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:38:02 fir-md1-s1 kernel: Lustre: 21448:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Jul 28 13:38:06 fir-md1-s1 kernel: LustreError: 21538:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2b7e70a050 x1631568692554304/t0(0) o3->9cb3a3a7-431b-a8f5-fef2-8703076397cf@10.9.107.42@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564346286 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 6a6c91bf-5994-6d2d-e34d-9ae740d430ac (at 10.9.107.29@o2ib4), client will retry: rc -110 Jul 28 13:38:06 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 28 13:38:06 fir-md1-s1 kernel: LustreError: 21538:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 45 previous similar messages Jul 28 13:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ce25d0e1-042f-8e04-e899-a91b78d4bc2b (at 10.9.102.61@o2ib4), client will retry: rc = -110 Jul 28 13:38:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 13:38:10 fir-md1-s1 kernel: Lustre: 23708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10c6346600 x1638091216579088/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:15/0 lens 576/3264 e 1 to 0 dl 1564346295 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:38:19 fir-md1-s1 kernel: Lustre: 97645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c586ce300 x1639005986716224/t0(0) o101->ecc7bd82-09bc-0059-52c6-3ab0877f2eb2@10.9.106.10@o2ib4:24/0 lens 1784/3288 e 0 to 0 dl 1564346304 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 13:38:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.106.58@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f24cf215100/0x5d9ee693b1b829bc lrc: 3/0,0 mode: PR/PR res: [0x200029d11:0x45f4:0x0].0x0 bits 0x13/0x0 rrc: 43 type: IBT flags: 0x60200400000020 nid: 10.9.106.58@o2ib4 remote: 0x2be890364ef13ebe expref: 277 pid: 21455 timeout: 3461363 lvb_type: 0 Jul 28 13:38:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 28 13:38:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eaf995be-0d27-b013-5e90-e619713af34c (at 10.8.13.6@o2ib6) reconnecting Jul 28 13:38:35 fir-md1-s1 kernel: Lustre: Skipped 8150 previous similar messages Jul 28 13:40:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 13:40:40 fir-md1-s1 kernel: Lustre: Skipped 8990 previous similar messages Jul 28 13:43:41 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1913432850 x1631353516428176/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:10/0 lens 488/448 e 0 to 0 dl 1564346650 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 13:43:41 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages Jul 28 13:43:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 28 13:43:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:45:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:46:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 13:46:38 fir-md1-s1 kernel: Lustre: Skipped 748 previous similar messages Jul 28 13:46:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:47:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:48:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 13:48:58 fir-md1-s1 kernel: Lustre: Skipped 150 previous similar messages Jul 28 13:50:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.113.10@o2ib4) Jul 28 13:50:41 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 28 13:52:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:56:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 13:56:53 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 28 13:58:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 13:58:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 13:59:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 13:59:09 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 14:01:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 14:01:01 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 28 14:05:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 14:05:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 14:08:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 14:08:16 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 14:09:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 14:09:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 14:11:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 14:11:28 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 28 14:18:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 14:18:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 14:18:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 14:18:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 28 14:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 14:19:11 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 28 14:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 14:21:40 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 14:28:26 fir-md1-s1 kernel: Lustre: 21683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0bd7247c50 x1634151489424512/t0(0) o4->1b7ab77d-df0f-991c-2a23-156fc86c9ce8@10.9.101.30@o2ib4:1/0 lens 10088/448 e 0 to 0 dl 1564349311 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 14:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 14:29:13 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 14:29:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 14:29:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 14:31:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 14:31:42 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 14:35:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 14:35:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 14:39:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 14:39:36 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 28 14:39:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 14:39:39 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 14:41:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 14:41:46 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 28 14:41:58 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 28 14:41:58 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 28 14:48:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 14:48:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 14:49:02 fir-md1-s1 kernel: Lustre: 30346:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f150015f850 x1639239071086800/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564350547 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 14:49:16 fir-md1-s1 kernel: Lustre: 21567:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f150015f850 x1639239071086800/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:7/0 lens 488/408 e 1 to 0 dl 1564350547 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 28 14:49:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 14:49:39 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 28 14:50:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 14:50:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 14:52:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 14:52:03 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 28 14:53:34 fir-md1-s1 kernel: Lustre: 20499:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f146bc75c50 x1638910378344000/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564350819 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 14:53:47 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f146bc75c50 x1638910378344000/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:9/0 lens 488/408 e 1 to 0 dl 1564350819 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 28 14:59:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 14:59:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 14:59:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 14:59:41 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 28 15:01:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 15:01:03 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 28 15:02:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 15:02:16 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 15:03:37 fir-md1-s1 kernel: Lustre: 49467:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f342aef2450 x1631686767737808/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:12/0 lens 504/448 e 0 to 0 dl 1564351422 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 15:03:42 fir-md1-s1 kernel: LustreError: 49466:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f342aef2450 x1631686767737808/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:12/0 lens 504/448 e 0 to 0 dl 1564351422 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 15:03:42 fir-md1-s1 kernel: LustreError: 49466:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 28 15:03:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Jul 28 15:09:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 15:09:18 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 15:09:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 15:09:46 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 15:10:36 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f15a1acf050 x1631686767796560/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:5/0 lens 504/448 e 0 to 0 dl 1564351865 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 15:10:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Jul 28 15:11:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 15:11:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 15:12:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 15:12:28 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 28 15:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 15:19:56 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 28 15:22:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 15:22:15 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 28 15:22:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 15:22:20 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 28 15:22:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 15:22:31 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 28 15:26:03 fir-md1-s1 kernel: Lustre: 21385:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0cbf6f5450 x1638903101868656/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564352768 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 15:26:14 fir-md1-s1 kernel: LustreError: 57558:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0cbf6f5450 x1638903101868656/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564352768 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 15:26:14 fir-md1-s1 kernel: LustreError: 57558:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 1 previous similar message Jul 28 15:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4), client will retry: rc -107 Jul 28 15:26:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 15:26:14 fir-md1-s1 kernel: Lustre: 57558:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f0cbf6f5450 x1638903101868656/t0(0) o3->fba6feb3-1d06-9f10-9905-c04ad67c5c45@10.9.115.13@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1564352768 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 15:30:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 15:30:05 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 15:30:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f289dd97000, cur 1564353048 expire 1564352898 last 1564352821 Jul 28 15:32:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 15:32:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 28 15:32:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 15:32:36 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 28 15:40:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 15:40:16 fir-md1-s1 kernel: Lustre: Skipped 81715 previous similar messages Jul 28 15:40:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 15:40:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 15:42:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 15:42:51 fir-md1-s1 kernel: Lustre: Skipped 81767 previous similar messages Jul 28 15:44:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 15:44:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 15:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 15:50:17 fir-md1-s1 kernel: Lustre: Skipped 148 previous similar messages Jul 28 15:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 15:53:02 fir-md1-s1 kernel: Lustre: Skipped 127 previous similar messages Jul 28 15:53:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 15:53:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 15:57:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 15:57:16 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 16:00:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 16:00:55 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 28 16:02:28 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f127957a850 x1639157666593776/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564354953 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 16:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 16:03:05 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 16:06:05 fir-md1-s1 kernel: Lustre: 52249:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f133a6b1050 x1638910583743520/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564355170 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 16:07:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 16:07:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 28 16:11:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 16:11:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 16:12:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 16:12:48 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 16:13:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 16:13:15 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 28 16:19:16 fir-md1-s1 kernel: LustreError: 6548:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f268aa97c50 x1631686768311856/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:4/0 lens 504/448 e 1 to 0 dl 1564355974 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 16:19:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Jul 28 16:19:49 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f14895fe050 x1634533294349264/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564355994 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 16:19:58 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -4+4s req@ffff8f14895fe050 x1634533294349264/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564355994 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 16:19:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bf0fab1f-ed86-800d-24d6-23f47310966d (at 10.9.113.8@o2ib4), client will retry: rc -110 Jul 28 16:19:58 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f14895fe050 x1634533294349264/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564355994 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 16:22:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 16:22:12 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 16:22:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 16:22:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 16:23:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 16:23:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 28 16:27:30 fir-md1-s1 kernel: Lustre: 20727:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564356443/real 1564356443] req@ffff8f202cc36000 x1636748332670880/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564356450 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Jul 28 16:27:36 fir-md1-s1 kernel: Lustre: 10561:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564356449/real 1564356449] req@ffff8f39af5a8c00 x1636748332798752/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564356456 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 16:27:44 fir-md1-s1 kernel: Lustre: 20727:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564356457/real 1564356457] req@ffff8f202cc36000 x1636748332670880/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564356464 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Jul 28 16:27:44 fir-md1-s1 kernel: Lustre: 20727:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 28 16:27:57 fir-md1-s1 kernel: Lustre: 10589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564356469/real 1564356469] req@ffff8f147330f200 x1636748333057872/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564356476 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 16:32:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 16:32:22 fir-md1-s1 kernel: Lustre: Skipped 20734 previous similar messages Jul 28 16:32:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 16:32:38 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 16:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 16:33:27 fir-md1-s1 kernel: Lustre: Skipped 20772 previous similar messages Jul 28 16:35:30 fir-md1-s1 kernel: Lustre: 10151:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564356923/real 1564356923] req@ffff8f3086342400 x1636748335272768/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564356930 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 16:37:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 16:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 16:42:31 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 28 16:43:19 fir-md1-s1 kernel: Lustre: 23645:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564357392/real 1564357392] req@ffff8f3247a47b00 x1636748338064224/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564357399 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 16:43:19 fir-md1-s1 kernel: Lustre: 23645:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 28 16:43:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 16:43:43 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 28 16:43:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 16:43:48 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 28 16:52:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 16:52:33 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 28 16:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 16:53:47 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 28 16:55:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 16:55:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 16:59:01 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564358334/real 1564358334] req@ffff8f1d94f6a400 x1636748344004384/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564358341 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 28 17:02:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 17:02:34 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 28 17:03:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 17:03:49 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 28 17:06:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 17:06:00 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 17:11:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:11:34 fir-md1-s1 kernel: Lustre: 52409:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0634d54850 x1636581764612752/t0(0) o3->42f49237-eaa5-3549-e9cf-6b0ef8d87e1a@10.9.112.7@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564359099 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 17:12:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 17:12:37 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 28 17:13:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:14:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 17:14:14 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 28 17:15:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:16:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 17:16:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 17:17:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:17:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 17:22:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 17:22:45 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 28 17:23:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c94c44800, cur 1564359811 expire 1564359661 last 1564359584 Jul 28 17:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 17:24:19 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 28 17:27:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 17:27:18 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 17:28:37 fir-md1-s1 kernel: LustreError: 46513:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f34efc4c450 x1631686768838880/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:6/0 lens 504/448 e 0 to 0 dl 1564360146 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 17:28:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Jul 28 17:32:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:32:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 17:32:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 17:32:53 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 17:33:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:33:52 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0503584050 x1634135876035008/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:27/0 lens 488/440 e 1 to 0 dl 1564360437 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 17:34:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 17:34:21 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 28 17:39:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 17:39:34 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 28 17:43:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 17:43:04 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 28 17:44:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 17:44:30 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 28 17:44:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:51:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 17:51:53 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 17:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 17:53:06 fir-md1-s1 kernel: Lustre: Skipped 61412 previous similar messages Jul 28 17:55:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 17:55:00 fir-md1-s1 kernel: Lustre: Skipped 61426 previous similar messages Jul 28 17:55:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:56:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 17:57:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 18:03:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 18:04:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 18:04:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 28 18:05:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 18:05:05 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 18:07:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:09:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:10:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 18:13:22 fir-md1-s1 kernel: Lustre: Skipped 37331 previous similar messages Jul 28 18:14:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 18:14:19 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 28 18:15:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 18:15:54 fir-md1-s1 kernel: Lustre: Skipped 37385 previous similar messages Jul 28 18:19:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:23:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 18:23:49 fir-md1-s1 kernel: Lustre: Skipped 51022 previous similar messages Jul 28 18:25:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 18:25:39 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 18:25:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 18:25:54 fir-md1-s1 kernel: Lustre: Skipped 51057 previous similar messages Jul 28 18:27:41 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 18:28:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:33:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 18:33:56 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 28 18:34:08 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1e79c66450 x1631686769340448/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:7/0 lens 504/448 e 0 to 0 dl 1564364077 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 18:34:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Jul 28 18:36:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 18:36:00 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 18:38:29 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f55794450 x1638956639948432/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564364314 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 18:38:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 28 18:38:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 28 18:38:49 fir-md1-s1 kernel: LustreError: 16648:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0f55794450 x1638956639948432/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564364314 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 18:38:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4), client will retry: rc -107 Jul 28 18:38:49 fir-md1-s1 kernel: Lustre: 16648:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:15s); client may timeout. req@ffff8f0f55794450 x1638956639948432/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564364314 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 18:39:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:39:39 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 18:40:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:41:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:44:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 18:44:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 28 18:46:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 18:46:02 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 28 18:49:34 fir-md1-s1 kernel: Lustre: 57558:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0d5ae56850 x1631637280109920/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:9/0 lens 488/440 e 0 to 0 dl 1564364979 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 18:49:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 18:49:36 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 28 18:49:40 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f0d5ae56850 x1631637280109920/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:9/0 lens 488/408 e 0 to 0 dl 1564364979 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 28 18:54:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 18:54:20 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 28 18:56:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 18:56:03 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 28 18:58:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 18:58:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 18:59:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 18:59:51 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 19:04:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 19:04:25 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 28 19:05:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 19:06:17 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 28 19:11:28 fir-md1-s1 kernel: Lustre: 35230:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b3a175050 x1636450147282048/t0(0) o3->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:3/0 lens 488/4536 e 1 to 0 dl 1564366293 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 19:12:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 19:12:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 28 19:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0979e049-f1fe-c03b-60aa-9c76a14b9428 (at 10.8.10.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2532676400, cur 1564366417 expire 1564366267 last 1564366190 Jul 28 19:14:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 19:14:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 19:15:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:16:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 19:16:24 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 28 19:18:51 fir-md1-s1 kernel: Lustre: 16648:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f091dcb6850 x1639195815222160/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564366736 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 19:20:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:20:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:23:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 19:23:42 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 19:24:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 19:24:38 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 28 19:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 19:26:32 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 28 19:26:45 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 28 19:27:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:30:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:30:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 19:34:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 19:34:39 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 28 19:36:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 19:36:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 19:36:40 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 28 19:36:40 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 28 19:40:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:40:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 19:43:45 fir-md1-s1 kernel: Lustre: 30346:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f11da68cc50 x1638254173592720/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:20/0 lens 488/440 e 1 to 0 dl 1564368230 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 19:43:50 fir-md1-s1 kernel: LustreError: 21545:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f11da68cc50 x1638254173592720/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:20/0 lens 488/440 e 1 to 0 dl 1564368230 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 19:43:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -110 Jul 28 19:44:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 19:44:56 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 28 19:46:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 19:46:49 fir-md1-s1 kernel: Lustre: Skipped 118 previous similar messages Jul 28 19:47:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 19:47:06 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 28 19:55:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 19:55:11 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 28 19:55:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 19:55:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 19:56:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 19:56:50 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 28 19:57:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 19:57:12 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 28 19:59:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d37f3e5-9240-7430-f3e5-aed00b2b5a17 (at 10.9.109.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14765b5400, cur 1564369149 expire 1564368999 last 1564368922 Jul 28 19:59:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 19:59:50 fir-md1-s1 kernel: Lustre: 46572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0966195c50 x1639515230805056/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:25/0 lens 488/440 e 1 to 0 dl 1564369195 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 20:05:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 20:05:15 fir-md1-s1 kernel: Lustre: Skipped 45103 previous similar messages Jul 28 20:06:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 20:06:55 fir-md1-s1 kernel: Lustre: Skipped 45112 previous similar messages Jul 28 20:10:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 20:10:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 28 20:10:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 20:10:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 20:15:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 20:15:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 28 20:17:01 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f129d828850 x1631637504209776/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564370226 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 20:17:07 fir-md1-s1 kernel: Lustre: 21545:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f129d828850 x1631637504209776/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:6/0 lens 488/408 e 1 to 0 dl 1564370226 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 28 20:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 20:17:16 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 28 20:21:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 20:21:15 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 20:24:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 20:24:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 20:26:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 20:26:25 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 20:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 20:27:23 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 28 20:31:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 20:31:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 20:35:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 20:36:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 20:36:55 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 28 20:37:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 20:37:25 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 28 20:43:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 20:43:05 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 28 20:46:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 20:46:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 20:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 20:47:00 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 28 20:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 20:47:34 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 28 20:53:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 20:53:06 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 20:57:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 20:57:20 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 28 20:57:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 20:57:39 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 28 21:00:04 fir-md1-s1 kernel: Lustre: 16185:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0dc0540050 x1639158039815600/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564372809 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 21:00:12 fir-md1-s1 kernel: LustreError: 21715:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0dc0540050 x1639158039815600/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564372809 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 21:00:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -107 Jul 28 21:00:12 fir-md1-s1 kernel: Lustre: 21715:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f0dc0540050 x1639158039815600/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564372809 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 21:02:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 21:02:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 21:03:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:03:53 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 28 21:07:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 21:07:27 fir-md1-s1 kernel: Lustre: Skipped 17420 previous similar messages Jul 28 21:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 21:07:40 fir-md1-s1 kernel: Lustre: Skipped 17455 previous similar messages Jul 28 21:13:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:13:54 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 21:14:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 21:14:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 21:17:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 21:17:33 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 28 21:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 21:17:43 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 28 21:24:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:24:25 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 28 21:25:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 21:25:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 28 21:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 21:27:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 28 21:27:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 21:27:55 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 21:34:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:34:28 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 21:35:20 fir-md1-s1 kernel: LustreError: 46565:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f18bb7fe050 x1631353525149120/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:9/0 lens 504/448 e 1 to 0 dl 1564374939 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 21:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 28 21:38:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 21:38:03 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 21:38:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 28 21:38:03 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 28 21:38:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 21:38:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 21:47:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:47:43 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 28 21:48:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 21:48:34 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 28 21:49:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 21:49:13 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 21:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 23a20415-f4f1-c881-311c-3f10763e2071 (at 10.8.2.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f196828e000, cur 1564375897 expire 1564375747 last 1564375670 Jul 28 21:51:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 21:53:16 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1240c16c50 x1638254386102896/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564376001 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 21:53:30 fir-md1-s1 kernel: LustreError: 35239:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1240c16c50 x1638254386102896/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564376001 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 21:53:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3ff68780-4eb8-0406-dadc-cabf67c4a043 (at 10.9.114.15@o2ib4), client will retry: rc -107 Jul 28 21:53:30 fir-md1-s1 kernel: Lustre: 35239:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:9s); client may timeout. req@ffff8f1240c16c50 x1638254386102896/t0(0) o3->3ff68780-4eb8-0406-dadc-cabf67c4a043@10.9.114.15@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564376001 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 21:55:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 21:55:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 21:57:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 21:57:46 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 28 21:58:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 21:58:49 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 28 21:59:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 21:59:16 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 28 22:08:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 22:08:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 22:08:55 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 28 22:08:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 22:08:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 28 22:09:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 22:09:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 28 22:19:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 22:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 22:19:08 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 28 22:19:08 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 28 22:19:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 22:19:30 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 28 22:20:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 22:20:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 22:29:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 22:29:12 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 28 22:29:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 28 22:29:21 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 28 22:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 22:29:35 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 28 22:36:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3354b3b000, cur 1564378604 expire 1564378454 last 1564378377 Jul 28 22:36:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 28 22:39:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 22:39:12 fir-md1-s1 kernel: Lustre: Skipped 61641 previous similar messages Jul 28 22:39:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 22:39:27 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 28 22:40:00 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ce5755050 x1638090417751168/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564378805 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 22:40:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 22:40:05 fir-md1-s1 kernel: Lustre: Skipped 61630 previous similar messages Jul 28 22:40:07 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0ce5755050 x1638090417751168/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564378805 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 22:40:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2 (at 10.9.114.8@o2ib4), client will retry: rc -107 Jul 28 22:40:07 fir-md1-s1 kernel: Lustre: 20501:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f0ce5755050 x1638090417751168/t0(0) o3->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564378805 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 28 22:48:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 22:48:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 28 22:49:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 28 22:49:17 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 28 22:49:35 fir-md1-s1 kernel: Lustre: 35230:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f069ad50050 x1638766002966896/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564379379 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 22:49:41 fir-md1-s1 kernel: LustreError: 21484:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -2+2s req@ffff8f069ad50050 x1638766002966896/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564379379 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 22:49:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 28 22:49:41 fir-md1-s1 kernel: Lustre: 21484:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f069ad50050 x1638766002966896/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1564379379 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 28 22:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 22:50:12 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 28 22:50:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 22:50:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 22:50:56 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 22:58:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 22:58:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 22:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 22:59:29 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 28 23:00:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 28 23:00:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 28 23:04:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 23:04:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 28 23:09:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 28 23:09:37 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 28 23:10:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 23:10:22 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 28 23:13:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 23:13:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 28 23:14:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 23:14:52 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 23:19:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 23:19:43 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 28 23:20:04 fir-md1-s1 kernel: LustreError: 21497:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2478308850 x1631353526888800/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:23/0 lens 504/448 e 1 to 0 dl 1564381223 ref 1 fl Interpret:/0/0 rc 0/0 Jul 28 23:20:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Jul 28 23:20:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 28 23:20:31 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 28 23:23:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 23:25:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 23:25:35 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 28 23:29:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 23:29:47 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 28 23:30:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 28 23:30:33 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 28 23:35:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 23:35:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 28 23:36:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 23:36:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 28 23:40:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 28 23:40:08 fir-md1-s1 kernel: Lustre: Skipped 3706 previous similar messages Jul 28 23:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 28 23:40:42 fir-md1-s1 kernel: Lustre: Skipped 3666 previous similar messages Jul 28 23:40:56 fir-md1-s1 kernel: Lustre: 35230:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0eb37cdc50 x1638795583018160/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:1/0 lens 488/440 e 1 to 0 dl 1564382461 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 23:46:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 28 23:46:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 28 23:48:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 23:48:25 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 28 23:50:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 28 23:50:18 fir-md1-s1 kernel: Lustre: Skipped 110272 previous similar messages Jul 28 23:50:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 28 23:50:55 fir-md1-s1 kernel: Lustre: Skipped 110268 previous similar messages Jul 28 23:55:40 fir-md1-s1 kernel: Lustre: 21290:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f103d3ed050 x1638886531598848/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1564383345 ref 2 fl Interpret:/0/0 rc 0/0 Jul 28 23:57:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 28 23:57:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 28 23:58:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 28 23:58:36 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 00:00:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 00:00:22 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 29 00:00:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 00:00:58 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 29 00:08:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 00:08:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 29 00:10:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 00:10:28 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 00:10:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 00:10:58 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 29 00:18:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 00:18:10 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 00:19:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 00:19:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 00:20:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 00:20:32 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Jul 29 00:21:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 00:21:04 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 00:23:02 fir-md1-s1 kernel: Lustre: 14102:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1330137050 x1638795672074912/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564384987 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 00:23:25 fir-md1-s1 kernel: Lustre: 21567:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:18s); client may timeout. req@ffff8f1330137050 x1638795672074912/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:7/0 lens 488/408 e 1 to 0 dl 1564384987 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 29 00:24:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 00:30:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 00:30:22 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 29 00:30:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 00:30:33 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 29 00:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 00:31:05 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 00:31:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 00:40:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 00:40:28 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 00:41:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 00:41:05 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 29 00:41:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 00:41:15 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 00:42:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 00:51:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 00:51:23 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 29 00:51:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 00:51:23 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 29 00:51:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 00:51:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 00:56:17 fir-md1-s1 kernel: Lustre: 22430:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a16892c50 x1638890692296544/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:22/0 lens 488/440 e 1 to 0 dl 1564386982 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 00:56:26 fir-md1-s1 kernel: Lustre: 35239:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f0a16892c50 x1638890692296544/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:22/0 lens 488/408 e 1 to 0 dl 1564386982 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 29 01:01:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:01:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 01:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 01:01:43 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 01:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 01:01:43 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 29 01:02:48 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 29 01:04:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:04:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 29 01:11:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:11:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 01:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 01:11:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 01:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 01:11:56 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 29 01:14:26 fir-md1-s1 kernel: Lustre: 21711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1309fd1c50 x1639196517535600/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:1/0 lens 488/440 e 0 to 0 dl 1564388071 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 01:14:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:14:49 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 29 01:22:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 01:22:17 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 29 01:22:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 01:22:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:22:19 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 01:22:19 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 01:25:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:25:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 29 01:32:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:32:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 01:32:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 01:32:44 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 29 01:32:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 01:32:44 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 29 01:35:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:35:39 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 29 01:42:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 01:42:52 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 29 01:43:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:43:17 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 29 01:43:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 01:43:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 01:49:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:49:42 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 01:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 01:52:52 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 29 01:52:53 fir-md1-s1 kernel: Lustre: 35241:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f140b893450 x1634136922542544/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:27/0 lens 488/440 e 1 to 0 dl 1564390377 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 01:52:58 fir-md1-s1 kernel: Lustre: 16648:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f140b893450 x1634136922542544/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:27/0 lens 488/408 e 1 to 0 dl 1564390377 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 29 01:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 01:53:27 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 29 01:53:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 01:53:28 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 29 01:59:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 01:59:50 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 02:02:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 02:02:56 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 29 02:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 02:04:06 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 02:05:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:05:10 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 29 02:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ba15e7800, cur 1564391365 expire 1564391215 last 1564391138 Jul 29 02:10:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 02:10:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 02:13:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 02:13:00 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 02:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 02:14:13 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 29 02:15:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:15:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 02:21:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 02:21:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 29 02:23:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 02:23:01 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 02:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 02:24:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 29 02:26:07 fir-md1-s1 kernel: Lustre: 13961:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10940c2050 x1639240931066976/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564392372 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 02:26:13 fir-md1-s1 kernel: Lustre: 14790:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f10940c2050 x1639240931066976/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:12/0 lens 488/408 e 1 to 0 dl 1564392372 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 29 02:26:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:26:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 02:33:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 02:33:21 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 02:33:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 02:33:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 02:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 02:35:38 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 02:36:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:36:20 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 02:43:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 02:43:26 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 02:43:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 02:43:26 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 29 02:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 02:46:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 02:46:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:46:38 fir-md1-s1 kernel: LustreError: Skipped 10 previous similar messages Jul 29 02:53:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 02:53:28 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 02:53:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 02:53:28 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 29 02:56:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 02:56:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 02:57:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 02:57:24 fir-md1-s1 kernel: LustreError: Skipped 12 previous similar messages Jul 29 03:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 03:03:33 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 03:03:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:03:44 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 03:07:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 03:07:31 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 03:07:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 03:07:59 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 29 03:13:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 03:13:47 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 03:14:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:14:00 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 29 03:14:40 fir-md1-s1 kernel: Lustre: 35234:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0ff76be050 x1639196754778064/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564395285 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 03:17:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 03:18:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f439567c800, cur 1564395509 expire 1564395359 last 1564395282 Jul 29 03:19:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 03:19:48 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 03:23:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 03:23:51 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 29 03:25:19 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f324a7c7800 x1638869859860720/t0(0) o101->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:0/0 lens 1768/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 29 03:25:19 fir-md1-s1 kernel: LustreError: 55010:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f0fea730c50 x1634137121550656/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:19/0 lens 488/440 e 0 to 0 dl 1564395919 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:19 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds Jul 29 03:25:19 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 40 previous similar messages Jul 29 03:25:19 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0f089dd400 Jul 29 03:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 29 03:25:19 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 03:25:19 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f09dfd61200 Jul 29 03:25:19 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f169757e800 Jul 29 03:25:19 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f403e52fa00 Jul 29 03:25:19 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 23 previous similar messages Jul 29 03:25:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:25:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 03:25:20 fir-md1-s1 kernel: LustreError: 49467:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f349aad4c50 x1631552893929904/t0(0) o4->40b3c666-85bb-7cc6-dce2-ca98ff07da91@10.9.109.6@o2ib4:25/0 lens 488/448 e 0 to 0 dl 1564395925 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc -110 Jul 29 03:25:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 29 03:25:20 fir-md1-s1 kernel: LustreError: 49467:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 29 03:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 40b3c666-85bb-7cc6-dce2-ca98ff07da91 (at 10.9.109.6@o2ib4), client will retry: rc = -110 Jul 29 03:25:21 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f32075e1450 x1638796103402576/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6), client will retry: rc -110 Jul 29 03:25:21 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 29 03:25:21 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 27 previous similar messages Jul 29 03:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 397e53ea-489f-22f1-95c4-27ab82ab5709 (at 10.9.102.43@o2ib4), client will retry: rc = -110 Jul 29 03:25:22 fir-md1-s1 kernel: LustreError: 46582:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f43ec188450 x1638933779005728/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:22 fir-md1-s1 kernel: LustreError: 46582:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 20 previous similar messages Jul 29 03:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8b33fbf1-f2ea-97c7-949f-7519ee33fba7 (at 10.8.2.26@o2ib6), client will retry: rc = -110 Jul 29 03:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4), client will retry: rc -110 Jul 29 03:25:23 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 29 03:25:24 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2ea51b5050 x1640215542700848/t0(0) o3->296d97ff-0de3-b3eb-25b6-28238cfb0a2e@10.8.9.8@o2ib6:9/0 lens 488/440 e 1 to 0 dl 1564395939 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:24 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 13 previous similar messages Jul 29 03:25:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Jul 29 03:25:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 29 03:25:26 fir-md1-s1 kernel: Lustre: 97647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564395919/real 1564395919] req@ffff8f1a044ab900 x1636748749126640/t0(0) o104->fir-MDT0002@10.9.104.70@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564395926 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 03:25:26 fir-md1-s1 kernel: Lustre: 97647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 29 03:25:28 fir-md1-s1 kernel: Lustre: 97600:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2b4c8bd050 x1631638636822752/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:29 fir-md1-s1 kernel: Lustre: 46540:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3c42875c50 x1636449978898368/t0(0) o4->4bd8572e-b5f6-5460-04d5-03b51a165b92@10.9.102.48@o2ib4:4/0 lens 488/448 e 1 to 0 dl 1564395934 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:29 fir-md1-s1 kernel: Lustre: 46540:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 29 03:25:31 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2b4c8bd050 x1631638636822752/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 29 03:25:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 29 03:25:31 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Jul 29 03:25:34 fir-md1-s1 kernel: Lustre: 21565:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3a13209050 x1634488355810096/t0(0) o4->bc38572c-4dfd-e060-15f5-2bafa5ab8152@10.9.101.43@o2ib4:9/0 lens 488/448 e 1 to 0 dl 1564395939 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:34 fir-md1-s1 kernel: Lustre: 21565:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 29 03:25:34 fir-md1-s1 kernel: LustreError: 46583:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 15+0s req@ffff8f3c42875c50 x1636449978898368/t0(0) o4->4bd8572e-b5f6-5460-04d5-03b51a165b92@10.9.102.48@o2ib4:4/0 lens 488/448 e 1 to 0 dl 1564395934 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:34 fir-md1-s1 kernel: LustreError: 46583:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 29 03:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4bd8572e-b5f6-5460-04d5-03b51a165b92 (at 10.9.102.48@o2ib4), client will retry: rc = -110 Jul 29 03:25:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 03:25:39 fir-md1-s1 kernel: LustreError: 46541:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f3a13209050 x1634488355810096/t0(0) o4->bc38572c-4dfd-e060-15f5-2bafa5ab8152@10.9.101.43@o2ib4:9/0 lens 488/448 e 1 to 0 dl 1564395939 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:39 fir-md1-s1 kernel: LustreError: 46541:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 29 03:25:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 09300796-1183-3575-4e70-90c873be0aeb (at 10.9.109.3@o2ib4), client will retry: rc -110 Jul 29 03:25:39 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Jul 29 03:25:43 fir-md1-s1 kernel: LustreError: 35241:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0be3105850 x1638891046572080/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:43 fir-md1-s1 kernel: Lustre: 35241:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:10s); client may timeout. req@ffff8f0be3105850 x1638891046572080/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564395933 ref 1 fl Complete:/0/ffffffff rc -107/-1 Jul 29 03:25:44 fir-md1-s1 kernel: Lustre: 49471:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f291f726450 x1638896174795680/t0(0) o4->e54a09f4-f4c0-8cfc-d512-347e4a10257c@10.9.106.57@o2ib4:19/0 lens 488/448 e 0 to 0 dl 1564395949 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:44 fir-md1-s1 kernel: Lustre: 49471:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 29 03:25:47 fir-md1-s1 kernel: LustreError: 49474:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f291f726450 x1638896174795680/t0(0) o4->e54a09f4-f4c0-8cfc-d512-347e4a10257c@10.9.106.57@o2ib4:19/0 lens 488/448 e 0 to 0 dl 1564395949 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:25:47 fir-md1-s1 kernel: LustreError: 49474:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 11 previous similar messages Jul 29 03:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with e54a09f4-f4c0-8cfc-d512-347e4a10257c (at 10.9.106.57@o2ib4), client will retry: rc = -110 Jul 29 03:25:47 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 29 03:27:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 48e0d20e-0492-e803-c544-6707061d1c78 (at 10.8.1.12@o2ib6) reconnecting Jul 29 03:27:43 fir-md1-s1 kernel: Lustre: Skipped 820 previous similar messages Jul 29 03:29:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 03:29:56 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 03:34:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 03:34:37 fir-md1-s1 kernel: Lustre: Skipped 1120 previous similar messages Jul 29 03:35:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:35:37 fir-md1-s1 kernel: Lustre: Skipped 305 previous similar messages Jul 29 03:37:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 03:37:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 03:40:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 03:40:12 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 29 03:44:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 03:44:40 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 03:46:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:46:46 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 03:48:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 03:48:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 03:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 03:56:25 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 03:56:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 03:56:47 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 29 03:58:57 fir-md1-s1 kernel: Lustre: 16186:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f131ab7c050 x1639241197603616/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564397942 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 03:58:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 03:58:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 29 03:59:03 fir-md1-s1 kernel: LustreError: 49229:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f131ab7c050 x1639241197603616/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564397942 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 03:59:03 fir-md1-s1 kernel: LustreError: 49229:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 29 03:59:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 29 03:59:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 03:59:03 fir-md1-s1 kernel: Lustre: 49229:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f131ab7c050 x1639241197603616/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564397942 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 04:02:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:02:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 04:04:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:04:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 04:05:56 fir-md1-s1 kernel: LustreError: 21037:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2f2297ac50 x1639241203327008/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564398374 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:05:56 fir-md1-s1 kernel: LustreError: 20499:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f08ffe7c850 x1639241203327088/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564398374 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:05:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 29 04:07:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 04:07:13 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 04:07:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:08:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 04:08:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 04:09:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 04:09:26 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 04:12:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:12:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 04:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 04:17:14 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 29 04:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 04:19:26 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 04:20:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 04:20:02 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 04:24:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:24:07 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 04:27:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 04:27:17 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 04:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 04:29:29 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 04:30:18 fir-md1-s1 kernel: Lustre: 21738:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f0fe2c13600 x1636449979396896/t0(0) o103->569c80f1-e322-40ae-cf23-d3ca8807a6fa@10.9.102.40@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 6547:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f305e9cc050 x1639241269756768/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:18/0 lens 488/440 e 0 to 0 dl 1564399818 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34726fee00 Jul 29 04:30:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a820bb5a-e007-7544-04a5-afedbe00ee4e (at 10.9.112.16@o2ib4), client will retry: rc -110 Jul 29 04:30:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f12961b0000 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 3, status -103, desc ffff8f3bc54aea00 Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 8 Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8f1e4bcea050 x1637888174206592/t0(0) o4->2718a14e-a89f-265a-5e0c-587412d87120@10.9.107.17@o2ib4:2/0 lens 504/448 e 1 to 0 dl 1564399832 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2718a14e-a89f-265a-5e0c-587412d87120 (at 10.9.107.17@o2ib4), client will retry: rc = -110 Jul 29 04:30:18 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 0 seconds Jul 29 04:30:18 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 6 previous similar messages Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 55538:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.29.7@o2ib6 from 10.0.10.51@o2ib7 Jul 29 04:30:18 fir-md1-s1 kernel: LNetError: 55538:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 3 previous similar messages Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2b1762a800 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 24566:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34efd1c000 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 21987:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f223dd87400 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 46521:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f4490f97000 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 49467:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f34f7e77e00 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 22649:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f0f4ee91800 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 48194:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2318c1ca00 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1489824000 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f304dfa2600 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1a71886800 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f14ac57ce00 Jul 29 04:30:18 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2df26bd200 Jul 29 04:30:18 fir-md1-s1 kernel: Lustre: 21738:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 29 04:30:19 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1e7afc7450 x1636352660416880/t0(0) o3->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564399832 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7eae5f9-18e9-99eb-0207-24a1fdf92451 (at 10.9.113.2@o2ib4), client will retry: rc -110 Jul 29 04:30:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 04:30:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.104.56@o2ib4, removing former export from same NID Jul 29 04:30:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 04:30:19 fir-md1-s1 kernel: LustreError: 48199:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f305e9cf050 x1631591966349296/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:19/0 lens 488/440 e 0 to 0 dl 1564399849 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8677433a-08df-e12f-9cbe-ab844f71c9a4 (at 10.9.106.69@o2ib4), client will retry: rc = -110 Jul 29 04:30:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 04:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4), client will retry: rc -110 Jul 29 04:30:20 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 04:30:21 fir-md1-s1 kernel: LustreError: 35089:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1175729050 x1637108405008048/t0(0) o3->59f5c312-adc4-b4a9-05e0-8c37d188c47f@10.9.112.13@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564399832 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:21 fir-md1-s1 kernel: LustreError: 35089:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 44 previous similar messages Jul 29 04:30:23 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f33722f9450 x1638780458478448/t0(0) o4->927ebcad-3373-a003-8433-ef313bb0111b@10.8.15.9@o2ib6:8/0 lens 488/448 e 1 to 0 dl 1564399838 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 29 04:30:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 04:30:23 fir-md1-s1 kernel: LustreError: 46521:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 29 04:30:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 49aa8323-a38d-3237-508c-ea94c68aa863 (at 10.9.108.53@o2ib4), client will retry: rc = -110 Jul 29 04:30:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 29 04:30:33 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f305e9ca050 x1638904540119008/t0(0) o3->d8428b3f-ceef-fb57-6c0a-b3ad15aaf988@10.8.27.7@o2ib6:8/0 lens 488/440 e 1 to 0 dl 1564399838 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:38 fir-md1-s1 kernel: LustreError: 24566:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f319710fc50 x1634519845481440/t0(0) o4->eaf995be-0d27-b013-5e90-e619713af34c@10.8.13.6@o2ib6:8/0 lens 520/456 e 1 to 0 dl 1564399838 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 04:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4c916b4c-f077-8202-b2a1-76eae483981d (at 10.8.24.12@o2ib6), client will retry: rc = -110 Jul 29 04:30:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 04:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d8428b3f-ceef-fb57-6c0a-b3ad15aaf988 (at 10.8.27.7@o2ib6), client will retry: rc -110 Jul 29 04:30:38 fir-md1-s1 kernel: LustreError: 24566:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 29 04:34:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:34:07 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 04:37:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 04:37:23 fir-md1-s1 kernel: Lustre: Skipped 697 previous similar messages Jul 29 04:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 04:39:43 fir-md1-s1 kernel: Lustre: Skipped 493 previous similar messages Jul 29 04:42:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 04:42:58 fir-md1-s1 kernel: Lustre: Skipped 197 previous similar messages Jul 29 04:45:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:45:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 04:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 04:47:24 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 04:47:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 29 04:47:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 29 04:50:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 04:50:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 04:53:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 04:53:40 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 29 04:55:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 04:55:24 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 04:57:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 04:57:25 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 05:00:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 05:00:24 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 29 05:03:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:03:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 29 05:05:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 05:05:51 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 29 05:06:12 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f075a001c50 x1638887318550672/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564401977 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 05:06:12 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 29 05:06:19 fir-md1-s1 kernel: LustreError: 46570:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f075a001c50 x1638887318550672/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564401977 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 05:06:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1277529-cbf1-b0b5-ff2d-5b114cf66536 (at 10.9.112.14@o2ib4), client will retry: rc -110 Jul 29 05:06:19 fir-md1-s1 kernel: Lustre: 46570:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (22:2s); client may timeout. req@ffff8f075a001c50 x1638887318550672/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564401977 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 05:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 05:08:48 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 05:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 05:11:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 05:12:16 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f21c409e450 x1638887331360128/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564402353 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 05:12:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1277529-cbf1-b0b5-ff2d-5b114cf66536 (at 10.9.112.14@o2ib4), client will retry: rc -110 Jul 29 05:12:16 fir-md1-s1 kernel: LustreError: 46574:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f207e74a450 x1638887331360992/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564402353 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 05:12:16 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 29 05:13:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:13:44 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 05:16:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 05:16:56 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 05:19:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 05:19:42 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 29 05:21:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 05:21:25 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 29 05:23:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:23:45 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 29 05:27:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 05:27:54 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 29 05:29:16 fir-md1-s1 kernel: Lustre: 10305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564403349/real 1564403349] req@ffff8f14a2498c00 x1636748771604032/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564403356 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 05:29:38 fir-md1-s1 kernel: Lustre: 23558:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564403371/real 1564403371] req@ffff8f14a249fb00 x1636748771667936/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564403378 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 05:30:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 05:30:05 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 05:31:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 05:31:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 05:33:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:33:48 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 05:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 05:38:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 05:40:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 05:40:18 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 29 05:41:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 05:41:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 29 05:43:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:43:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 05:50:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 05:50:20 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 29 05:52:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 05:52:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 05:53:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b4a7b7d5-de58-1236-2dcc-45c9afa77e7c (at 10.9.109.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33b56c7800, cur 1564404788 expire 1564404638 last 1564404561 Jul 29 05:53:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ae4f58f-2e5c-44a9-3904-2fb330d81877 (at 10.9.109.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cea469c00, cur 1564404802 expire 1564404652 last 1564404575 Jul 29 05:53:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 05:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 05:57:33 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 06:00:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 06:00:27 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 29 06:02:45 fir-md1-s1 kernel: Lustre: 71828:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ebf2ccb00 x1637166596187904/t0(0) o37->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 448/440 e 1 to 0 dl 1564405370 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 06:02:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 06:02:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 06:02:53 fir-md1-s1 kernel: LustreError: 71866:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0ebf2ccb00 x1637166596187904/t0(0) o37->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 448/440 e 1 to 0 dl 1564405370 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 06:02:53 fir-md1-s1 kernel: LustreError: 71866:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 3 previous similar messages Jul 29 06:02:53 fir-md1-s1 kernel: Lustre: 71866:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f0ebf2ccb00 x1637166596187904/t0(0) o37->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:20/0 lens 448/408 e 1 to 0 dl 1564405370 ref 1 fl Complete:/0/0 rc -107/-107 Jul 29 06:02:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:02:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 06:06:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 06:08:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 06:10:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:11:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 06:11:05 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 06:12:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 06:12:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 06:15:57 fir-md1-s1 kernel: Lustre: 21683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f09dfc57c50 x1639241541992976/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564406162 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 06:18:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 06:18:52 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 29 06:21:19 fir-md1-s1 kernel: Lustre: 81719:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06fb169050 x1639241555271472/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:24/0 lens 488/440 e 1 to 0 dl 1564406484 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 06:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 06:21:21 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 29 06:23:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 06:23:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 06:24:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:24:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 06:29:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 06:29:29 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 06:31:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 06:31:58 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 29 06:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 06:33:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 06:36:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:36:58 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 06:39:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 06:39:30 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 29 06:42:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 06:42:22 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 29 06:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 06:43:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 06:49:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 06:49:38 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 06:51:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 06:51:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 06:52:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 06:52:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 29 06:53:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 06:53:54 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 06:59:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 06:59:54 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 07:02:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 07:02:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 07:02:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 07:02:48 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 29 07:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 07:04:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 07:11:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 07:11:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 07:12:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 07:12:49 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 07:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 07:15:06 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 29 07:22:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 07:22:51 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 07:22:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 07:22:51 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 07:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 07:25:19 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 07:26:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 07:26:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 07:29:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 07:32:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 07:32:56 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 29 07:33:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 07:33:23 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 07:34:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 07:34:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 07:35:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 07:35:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 07:43:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 07:43:15 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 07:43:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 07:43:24 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 29 07:46:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 07:46:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 07:47:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 07:47:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 07:53:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 07:53:52 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 29 07:56:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 07:56:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 29 07:57:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 07:57:28 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 08:00:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 08:00:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 08:04:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 08:04:00 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 08:06:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 08:06:10 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 29 08:08:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 08:08:34 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 08:11:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 08:11:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 08:14:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 08:14:18 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 08:16:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 08:16:22 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 08:20:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 08:20:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564413611/real 0] req@ffff8f3dcbe84b00 x1636748808291440/t0(0) o13->fir-OST001f-osc-MDT0000@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: 20241:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564413611/real 0] req@ffff8f4180d3d700 x1636748808291008/t0(0) o13->fir-OST000e-osc-MDT0000@10.0.10.103@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: 20239:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564413611/real 0] req@ffff8f4180d3f200 x1636748808291408/t0(0) o13->fir-OST0015-osc-MDT0000@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: 20209:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564413611/real 0] req@ffff8f0e6555e300 x1636748808291120/t0(0) o13->fir-OST0014-osc-MDT0002@10.0.10.103@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: fir-OST0015-osc-MDT0000: Connection to fir-OST0015 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 29 08:20:18 fir-md1-s1 kernel: Lustre: 20240:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 37 previous similar messages Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564413611/real 0] req@ffff8f39973c2400 x1636748808291552/t0(0) o13->fir-OST002c-osc-MDT0000@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: fir-OST000e-osc-MDT0002: Connection to fir-OST000e (at 10.0.10.103@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: fir-OST002e-osc-MDT0002: Connection to fir-OST002e (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 08:20:19 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 27582:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.4@o2ib4: deadline 6:4s ago req@ffff8f1f90510850 x1639158424996176/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:16/0 lens 488/0 e 0 to 0 dl 1564413616 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 27582:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 17 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 27582:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff8f1f90510850 x1639158424996176/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:16/0 lens 488/0 e 0 to 0 dl 1564413616 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 31004:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff8f38b6150600 x1639197339141552/t0(0) o103->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:0/0 lens 344/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 31004:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=1, delay=0 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 7 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f38b6150600 x1639197339141552/t0(0) o103->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:16/0 lens 344/0 e 0 to 0 dl 1564413616 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 69 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 4 seconds Jul 29 08:20:20 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (10): c: 1, oc: 0, rc: 8 Jul 29 08:20:20 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 21293:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff8f16efe59c50 x1638891815719008/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564413615 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:20 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 4 seconds Jul 29 08:20:20 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 19 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16d86ab800 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f287bf3fc00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f106f35f600 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0fe6cfce00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f2950768800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3406e79c00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f287bf3ee00 Jul 29 08:20:20 fir-md1-s1 kernel: LNetError: 23716:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.1@o2ib6 from 10.0.10.51@o2ib7 Jul 29 08:20:20 fir-md1-s1 kernel: LNetError: 23716:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 190 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0996200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3406e78e00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16d86ab800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0995000 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f36fd7e7c00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22f261a200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32e9f86000 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0991000 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3763662800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0995000 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3763660800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3406e78e00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3406e78800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0997200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20198:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea0993600 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3406e7f800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f106f35ba00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15dd3dd800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2b1d6dda00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0fe6cfc200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f295076c200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f36fd7e0200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0549a71400 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3763665600 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f16ea428200 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2eb37e8400 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3763665c00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3763660c00 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0549a75800 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0549a74600 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 3, status -5, desc ffff8f291d114600 Jul 29 08:20:20 fir-md1-s1 kernel: LustreError: 24565:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8f2e2ea99450 x1639983635631056/t0(0) o4->baaf9aa6-d6ac-d219-ff91-f47dd67dd412@10.8.29.6@o2ib6:0/0 lens 488/448 e 1 to 0 dl 1564413630 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6), client will retry: rc = -110 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 46534:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=53 reqQ=0 recA=6, svcEst=11, delay=9411 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 46534:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 3 previous similar messages Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 46534:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f253a02a450 x1635716452683696/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:16/0 lens 488/0 e 0 to 0 dl 1564413616 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:20:20 fir-md1-s1 kernel: Lustre: 46534:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Jul 29 08:20:24 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+9s req@ffff8f1e07a79050 x1638934098957056/t0(0) o3->0074f13d-7764-019e-fa05-08395204d95a@10.9.112.10@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564413615 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: 21448:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:9s); client may timeout. req@ffff8f220eb55850 x1634137770298688/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1564413615 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: 21448:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 58 previous similar messages Jul 29 08:20:24 fir-md1-s1 kernel: LustreError: 42894:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 21 previous similar messages Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: 46541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3d41f67050 x1634137770298624/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564413629 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:24 fir-md1-s1 kernel: Lustre: 46541:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Jul 29 08:20:25 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+9s req@ffff8f1ec3d14450 x1638796794015136/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:16/0 lens 488/440 e 0 to 0 dl 1564413616 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d8d6f8e7-a2cd-08f2-c263-fa8b0dbeef3c (at 10.8.8.2@o2ib6), client will retry: rc -110 Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: 21544:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:9s); client may timeout. req@ffff8f1b19e3f050 x1631617870488656/t0(0) o3->d8d6f8e7-a2cd-08f2-c263-fa8b0dbeef3c@10.8.8.2@o2ib6:16/0 lens 488/440 e 0 to 0 dl 1564413616 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: 21544:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 29 08:20:25 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: 46541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4447f3cc50 x1638796794015520/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:0/0 lens 488/440 e 1 to 0 dl 1564413630 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:25 fir-md1-s1 kernel: Lustre: 46541:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages Jul 29 08:20:27 fir-md1-s1 kernel: Lustre: 20248:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564413620/real 1564413620] req@ffff8f3ec0348f00 x1636748808292912/t0(0) o13->fir-OST002b-osc-MDT0000@10.0.10.108@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564413627 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:20:27 fir-md1-s1 kernel: Lustre: fir-OST001f-osc-MDT0002: Connection to fir-OST001f (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:20:27 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 08:20:27 fir-md1-s1 kernel: Lustre: 20248:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 34 previous similar messages Jul 29 08:20:28 fir-md1-s1 kernel: LustreError: 20905:0:(osp_precreate.c:940:osp_precreate_cleanup_orphans()) fir-OST0023-osc-MDT0000: cannot cleanup orphans: rc = -11 Jul 29 08:20:29 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 9+0s req@ffff8f29f5360c50 x1639241904917312/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564413629 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7eae5f9-18e9-99eb-0207-24a1fdf92451 (at 10.9.113.2@o2ib4), client will retry: rc -110 Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 29 08:20:29 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 29 08:20:29 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 24 previous similar messages Jul 29 08:20:30 fir-md1-s1 kernel: Lustre: 24563:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f2e2ea98450 x1638802541586128/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564413629 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:20:30 fir-md1-s1 kernel: Lustre: 24563:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 29 08:20:34 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 9+5s req@ffff8f0c3fc59050 x1634534596126880/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564413629 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:34 fir-md1-s1 kernel: LustreError: 20501:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 47 previous similar messages Jul 29 08:20:34 fir-md1-s1 kernel: Lustre: 35236:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:5s); client may timeout. req@ffff8f0bca9d1c50 x1639158424995872/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:29/0 lens 488/440 e 1 to 0 dl 1564413629 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:20:34 fir-md1-s1 kernel: Lustre: 35236:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 29 08:20:35 fir-md1-s1 kernel: Lustre: 21743:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3996799450 x1634137770300208/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:10/0 lens 488/440 e 1 to 0 dl 1564413640 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:35 fir-md1-s1 kernel: Lustre: 21743:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 29 08:20:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 29 08:20:40 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 08:20:42 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23053bda00 x1634468795805472/t0(0) o101->ec8e478a-93b2-34d3-2772-2238a12dddbe@10.8.18.28@o2ib6:17/0 lens 1776/3288 e 1 to 0 dl 1564413647 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:42 fir-md1-s1 kernel: Lustre: 20730:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 29 08:20:48 fir-md1-s1 kernel: Lustre: 97657:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f242bf9f500 x1635088463003328/t0(0) o101->ab156947-f66a-d1d7-84dc-bc4d0ff395c3@10.9.104.41@o2ib4:23/0 lens 576/3264 e 1 to 0 dl 1564413653 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:20:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.102.1@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3ee1ba8480/0x5d9ee699986ce4d1 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 190 type: IBT flags: 0x60200400000020 nid: 10.9.102.1@o2ib4 remote: 0xf910cbc8e23862b8 expref: 571 pid: 23667 timeout: 3528716 lvb_type: 0 Jul 29 08:20:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 29 08:21:04 fir-md1-s1 kernel: Lustre: 20459:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10bc143000 x1635091774987664/t0(0) o101->65cee6f7-278a-50e1-f966-888a0bdc6354@10.9.109.30@o2ib4:9/0 lens 1776/3288 e 1 to 0 dl 1564413669 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:24:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 08:24:49 fir-md1-s1 kernel: Lustre: Skipped 2406 previous similar messages Jul 29 08:25:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 08:25:50 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 08:26:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 08:26:24 fir-md1-s1 kernel: Lustre: Skipped 734 previous similar messages Jul 29 08:30:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 08:30:15 fir-md1-s1 kernel: Lustre: Skipped 1576 previous similar messages Jul 29 08:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 08:35:01 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 29 08:36:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 08:36:38 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 29 08:40:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 08:40:35 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 08:45:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 08:45:15 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 29 08:47:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 08:47:17 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 08:49:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 08:49:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 08:50:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 08:50:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 08:55:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 08:55:31 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 08:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 08:57:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 08:57:20 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 29 08:57:33 fir-md1-s1 kernel: LustreError: 44044:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f212604a850 x1638891892815008/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564415866 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1800347-72ce-eadd-608d-51a435000390 (at 10.9.112.15@o2ib4), client will retry: rc -110 Jul 29 08:57:34 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564415847/real 0] req@ffff8f44bc0e0000 x1636748820422432/t0(0) o13->fir-OST0023-osc-MDT0002@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564415854 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:57:34 fir-md1-s1 kernel: Lustre: 20245:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 29 08:57:34 fir-md1-s1 kernel: LustreError: 21534:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3e18759050 x1638958251462688/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:18/0 lens 488/440 e 1 to 0 dl 1564415868 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:34 fir-md1-s1 kernel: Lustre: fir-OST0023-osc-MDT0002: Connection to fir-OST0023 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:57:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 08:57:35 fir-md1-s1 kernel: LustreError: 46578:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1d847ccc50 x1636353402098528/t0(0) o3->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:26/0 lens 488/440 e 0 to 0 dl 1564415876 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:35 fir-md1-s1 kernel: LustreError: 46578:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 29 08:57:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f7eae5f9-18e9-99eb-0207-24a1fdf92451 (at 10.9.113.2@o2ib4), client will retry: rc -110 Jul 29 08:57:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 97599:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.113.3@o2ib4: deadline 6:3s ago req@ffff8f2973b09850 x1638802633793920/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:3/0 lens 488/0 e 0 to 0 dl 1564415853 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 97599:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 27 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 46571:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=9 reqQ=0 recA=27, svcEst=20, delay=7141 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 97599:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8f2973b09850 x1638802633793920/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:3/0 lens 488/0 e 0 to 0 dl 1564415853 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 97599:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f275d063c50 x1636353402099056/t0(0) o3->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1564415877 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564415848/real 0] req@ffff8f169e8d3000 x1636748820422560/t0(0) o6->fir-OST0020-osc-MDT0002@10.0.10.105@o2ib7:28/4 lens 544/432 e 0 to 1 dl 1564415855 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 46571:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0bc78d7850 x1637398589688128/t0(0) o4->6eed6c6e-bf9d-6eed-41d9-2953d0976391@10.9.101.4@o2ib4:4/0 lens 488/448 e 0 to 0 dl 1564415854 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 20238:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 46571:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 52 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: fir-OST0020-osc-MDT0002: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 46810:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 9s req@ffff8f2fc6d48f00 x1638090429248464/t0(0) o103->5fc014af-e3d7-51ad-6083-2ba5cb7bd6c2@10.9.114.8@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 46810:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 7 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 3 seconds Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 14 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 1, oc: 3, rc: 3 Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 14 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 20 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f2df26bbe00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 24569:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff8f275d065050 x1638912366775408/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564415853 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 24569:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with d3c03fa2-3e41-4741-cf2d-21c94adb10e5 (at 10.9.108.40@o2ib4), client will retry: rc = -107 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f38c8fa0a00 Jul 29 08:57:36 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 2239 seconds Jul 29 08:57:36 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 147 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0982bc7400 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0982bc0a00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f22f261c200 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20191:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f22f261d800 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f168b4cec00 Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 21412:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.8@o2ib6 from 10.0.10.51@o2ib7 Jul 29 08:57:36 fir-md1-s1 kernel: LNetError: 21412:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 12 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 48201:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f372f6d3600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 22432:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f18d53e6000 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2df26b8000 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f298d3f4600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f22f261b800 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f084a4f5c00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 48198:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2df26bcc00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 46563:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f16d86aee00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 21294:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f298d3f3600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 21542:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f36ea289c00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f298d3f7600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 21717:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f084a4f2a00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 48195:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f319d513a00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 23107:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f2df26b9800 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f33f22a9e00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0ebf31c200 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f168b4cf600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e3e2fea00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e3e2fa200 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 49252:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 49252:0:(pack_generic.c:590:__lustre_unpack_msg()) Skipped 1 previous similar message Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 49252:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.102.22@o2ib4 x1638869058554464 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 49252:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) Skipped 1 previous similar message Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e3e2fc400 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f16d86ade00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f298d3f0400 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2df26bbc00 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f168b4c8600 Jul 29 08:57:36 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f084a4f6a00 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=20 reqQ=0 recA=27, svcEst=20, delay=8143 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3505ed4c50 x1638802633794016/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:3/0 lens 488/0 e 0 to 0 dl 1564415853 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 08:57:36 fir-md1-s1 kernel: Lustre: 21245:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Jul 29 08:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with aec69d6f-8b9d-1fe2-74fb-aa6ac6ee7bb1 (at 10.9.106.63@o2ib4), client will retry: rc = -110 Jul 29 08:57:38 fir-md1-s1 kernel: LustreError: 21686:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3e7baee050 x1631650642472048/t0(0) o4->6d4d8c33-ecef-fdb4-378f-8ac8e4e1e0ce@10.9.101.34@o2ib4:0/0 lens 488/448 e 0 to 0 dl 1564415880 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:38 fir-md1-s1 kernel: LustreError: 21686:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 29 08:57:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6d4d8c33-ecef-fdb4-378f-8ac8e4e1e0ce (at 10.9.101.34@o2ib4), client will retry: rc = -110 Jul 29 08:57:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 29 08:57:40 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 08:57:41 fir-md1-s1 kernel: LustreError: 49228:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 6+7s req@ffff8f0bc78d7850 x1637398589688128/t0(0) o4->6eed6c6e-bf9d-6eed-41d9-2953d0976391@10.9.101.4@o2ib4:4/0 lens 488/448 e 0 to 0 dl 1564415854 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:41 fir-md1-s1 kernel: LustreError: 49228:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 14 previous similar messages Jul 29 08:57:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6eed6c6e-bf9d-6eed-41d9-2953d0976391 (at 10.9.101.4@o2ib4), client will retry: rc = -110 Jul 29 08:57:41 fir-md1-s1 kernel: Lustre: 49228:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff8f0bc78d7850 x1637398589688128/t0(0) o4->6eed6c6e-bf9d-6eed-41d9-2953d0976391@10.9.101.4@o2ib4:4/0 lens 488/448 e 0 to 0 dl 1564415854 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:57:41 fir-md1-s1 kernel: Lustre: 49228:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 28 previous similar messages Jul 29 08:57:41 fir-md1-s1 kernel: Lustre: 21741:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e1875c850 x1638767281307616/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564415866 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:41 fir-md1-s1 kernel: Lustre: 21741:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 29 08:57:42 fir-md1-s1 kernel: LustreError: 46538:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f4413efcc50 x1636353402099056/t0(0) o3->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:4/0 lens 488/440 e 0 to 0 dl 1564415884 ref 1 fl Interpret:/2/0 rc 0/0 Jul 29 08:57:42 fir-md1-s1 kernel: LustreError: 46538:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564415856/real 1564415856] req@ffff8f0b8b46fb00 x1636748820424672/t0(0) o106->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564415863 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: fir-OST0021-osc-MDT0002: Connection to fir-OST0021 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 47 previous similar messages Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: 22225:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e1875a850 x1637398589688352/t0(0) o4->6eed6c6e-bf9d-6eed-41d9-2953d0976391@10.9.101.4@o2ib4:18/0 lens 488/448 e 1 to 0 dl 1564415868 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:43 fir-md1-s1 kernel: Lustre: 22225:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jul 29 08:57:47 fir-md1-s1 kernel: LustreError: 27587:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 11+0s req@ffff8f23eb8bc050 x1639242011969104/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564415867 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:47 fir-md1-s1 kernel: Lustre: 56757:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f37cbe73c50 x1638912366775104/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564415866 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:57:47 fir-md1-s1 kernel: LustreError: 27587:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 29 08:57:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f914310c-7825-8c6a-2b04-354707ee5046 (at 10.9.113.3@o2ib4), client will retry: rc -110 Jul 29 08:57:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 29 08:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6eed6c6e-bf9d-6eed-41d9-2953d0976391 (at 10.9.101.4@o2ib4), client will retry: rc = -110 Jul 29 08:57:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 08:57:51 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2281142850 x1638912366776032/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:26/0 lens 488/440 e 1 to 0 dl 1564415876 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:51 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 29 08:57:56 fir-md1-s1 kernel: LustreError: 16648:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f0c19a58850 x1634137839717840/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:26/0 lens 488/440 e 0 to 0 dl 1564415876 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:57:56 fir-md1-s1 kernel: LustreError: 16648:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 19 previous similar messages Jul 29 08:57:58 fir-md1-s1 kernel: Lustre: 66902:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f4017dd4c50 x1638767281307648/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:26/0 lens 488/440 e 0 to 0 dl 1564415876 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 29 08:57:58 fir-md1-s1 kernel: Lustre: 66902:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Jul 29 08:57:59 fir-md1-s1 kernel: Lustre: 20247:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564415848/real 1564415856] req@ffff8f169e8d6000 x1636748820422672/t0(0) o6->fir-OST0019-osc-MDT0002@10.0.10.106@o2ib7:28/4 lens 544/432 e 0 to 1 dl 1564415879 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 08:57:59 fir-md1-s1 kernel: Lustre: 20247:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 29 08:57:59 fir-md1-s1 kernel: Lustre: fir-OST0019-osc-MDT0002: Connection to fir-OST0019 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 29 08:57:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 29 08:58:02 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2015ecd100 x1634803049524816/t355333966710(0) o36->13161d75-7fca-8358-cbb9-e2cc56095752@10.9.105.12@o2ib4:7/0 lens 488/3152 e 0 to 0 dl 1564415887 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 08:58:02 fir-md1-s1 kernel: Lustre: 20723:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 58 previous similar messages Jul 29 08:58:04 fir-md1-s1 kernel: LustreError: 25632:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 29+8s req@ffff8f1e65d3bc50 x1638870136637904/t0(0) o3->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1564415876 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 08:58:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 8df94149-5690-262d-f805-cc7898f99b40 (at 10.8.16.5@o2ib6), client will retry: rc -110 Jul 29 08:58:04 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 08:58:04 fir-md1-s1 kernel: LustreError: 25632:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 37 previous similar messages Jul 29 08:58:07 fir-md1-s1 kernel: Lustre: 20641:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564415856/real 1564415856] req@ffff8f176d760c00 x1636748820426144/t0(0) o5->fir-OST0021-osc-MDT0002@10.0.10.106@o2ib7:28/4 lens 432/432 e 0 to 1 dl 1564415887 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Jul 29 08:58:07 fir-md1-s1 kernel: LustreError: 20641:0:(osp_precreate.c:656:osp_precreate_send()) fir-OST0021-osc-MDT0002: can't precreate: rc = -107 Jul 29 08:58:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.26@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f222c511b00/0x5d9ee699bb3e5567 lrc: 3/0,0 mode: PR/PR res: [0x2c002c27c:0x1bc0f:0x0].0x0 bits 0x1b/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.8.8.26@o2ib6 remote: 0xf2d4b1d7a8d2155c expref: 9432 pid: 97646 timeout: 3530947 lvb_type: 0 Jul 29 08:58:07 fir-md1-s1 kernel: LustreError: 23616:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2f10c66900 x1636748820440480/t0(0) o104->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 29 08:58:07 fir-md1-s1 kernel: LustreError: 23616:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 29 09:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 09:00:45 fir-md1-s1 kernel: Lustre: Skipped 1397 previous similar messages Jul 29 09:05:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 09:05:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 09:05:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 09:05:33 fir-md1-s1 kernel: Lustre: Skipped 2111 previous similar messages Jul 29 09:08:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 09:08:00 fir-md1-s1 kernel: Lustre: Skipped 663 previous similar messages Jul 29 09:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 09:10:56 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 09:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 09:15:35 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 29 09:18:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 29 09:18:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 09:21:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 09:21:12 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 09:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 09:25:57 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 09:26:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 09:26:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 09:29:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 09:29:15 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 09:31:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 09:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 09:32:23 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 09:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 09:36:09 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 29 09:36:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f46dce57-e0f0-08b3-6c14-cb80f5f23489 (at 10.9.103.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e14b23000, cur 1564418175 expire 1564418025 last 1564417948 Jul 29 09:39:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 09:39:18 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 09:40:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 09:42:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 09:42:50 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 09:45:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 09:45:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 09:46:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 09:46:24 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 29 09:49:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 09:49:28 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 29 09:53:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 09:53:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 09:56:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 09:56:35 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 29 10:00:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 10:00:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 10:01:25 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419678/real 1564419678] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419685 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 10:01:25 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 29 10:01:33 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419685/real 1564419685] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419692 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:01:33 fir-md1-s1 kernel: Lustre: 21417:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f070729b300 x1638958351182096/t0(0) o101->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:8/0 lens 576/3264 e 1 to 0 dl 1564419698 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:01:33 fir-md1-s1 kernel: Lustre: 21417:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Jul 29 10:01:35 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1da7394b00 x1639990719405648/t0(0) o101->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:10/0 lens 576/3264 e 1 to 0 dl 1564419700 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:01:35 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 51 previous similar messages Jul 29 10:01:40 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419693/real 1564419693] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419700 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:01:43 fir-md1-s1 kernel: Lustre: 23599:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f080ab6e000 x1634534680164960/t0(0) o101->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:18/0 lens 576/3264 e 0 to 0 dl 1564419708 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:01:43 fir-md1-s1 kernel: Lustre: 23599:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Jul 29 10:01:47 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419700/real 1564419700] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419707 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:01:56 fir-md1-s1 kernel: Lustre: 23621:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3fb155dd00 x1637986666221472/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:1/0 lens 576/3264 e 0 to 0 dl 1564419721 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:01:56 fir-md1-s1 kernel: Lustre: 23621:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 165 previous similar messages Jul 29 10:02:01 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419714/real 1564419714] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419721 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:02:01 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 29 10:02:15 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-26), not sending early reply req@ffff8f2fc7913300 x1638888493893712/t0(0) o101->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:20/0 lens 576/3264 e 0 to 0 dl 1564419740 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:02:15 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Jul 29 10:02:22 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419735/real 1564419735] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419742 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:02:22 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 29 10:02:47 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-25), not sending early reply req@ffff8f204244a400 x1638540529080144/t0(0) o101->1890d675-ce1f-cd8f-dea3-5b5821d43c68@10.8.0.67@o2ib6:22/0 lens 576/3264 e 0 to 0 dl 1564419772 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:02:47 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jul 29 10:02:49 fir-md1-s1 kernel: LustreError: 10146:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419678, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2e732dda00/0x5d9ee699fd3c7352 lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 595 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10146 timeout: 0 lvb_type: 0 Jul 29 10:02:49 fir-md1-s1 kernel: LustreError: 10146:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 143 previous similar messages Jul 29 10:02:49 fir-md1-s1 kernel: LustreError: 25680:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419679, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f348c751440/0x5d9ee699fd3ccc84 lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 595 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 25680 timeout: 0 lvb_type: 0 Jul 29 10:02:49 fir-md1-s1 kernel: LustreError: 25680:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages Jul 29 10:02:50 fir-md1-s1 kernel: LustreError: 24581:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419680, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2665ba4c80/0x5d9ee699fd3d50c7 lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 595 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24581 timeout: 0 lvb_type: 0 Jul 29 10:02:50 fir-md1-s1 kernel: LustreError: 24581:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages Jul 29 10:02:52 fir-md1-s1 kernel: LustreError: 50448:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419682, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f18424a0fc0/0x5d9ee699fd3eb49a lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 597 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50448 timeout: 0 lvb_type: 0 Jul 29 10:02:52 fir-md1-s1 kernel: LustreError: 50448:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 27 previous similar messages Jul 29 10:02:57 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564419770/real 1564419770] req@ffff8f15347fd400 x1636748843559712/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564419777 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 29 10:02:57 fir-md1-s1 kernel: Lustre: 23660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jul 29 10:02:57 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419687, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0d62b51440/0x5d9ee699fd41acf3 lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 603 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97643 timeout: 0 lvb_type: 0 Jul 29 10:02:57 fir-md1-s1 kernel: LustreError: 97643:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 10 previous similar messages Jul 29 10:03:11 fir-md1-s1 kernel: LustreError: 21458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419701, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1fb9af4a40/0x5d9ee699fd4b162e lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 605 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21458 timeout: 0 lvb_type: 0 Jul 29 10:03:11 fir-md1-s1 kernel: LustreError: 21458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Jul 29 10:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f914310c-7825-8c6a-2b04-354707ee5046 (at 10.9.113.3@o2ib4) reconnecting Jul 29 10:03:22 fir-md1-s1 kernel: Lustre: Skipped 251 previous similar messages Jul 29 10:03:28 fir-md1-s1 kernel: LustreError: 97664:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564419717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e84eec380/0x5d9ee699fd5544ff lrc: 3/1,0 mode: --/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 611 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97664 timeout: 0 lvb_type: 0 Jul 29 10:03:28 fir-md1-s1 kernel: LustreError: 97664:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 16 previous similar messages Jul 29 10:03:52 fir-md1-s1 kernel: Lustre: 97671:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f189fce3c00 x1635240280396816/t0(0) o101->5856c966-b502-541a-bf79-fec68258d993@10.9.101.29@o2ib4:27/0 lens 576/0 e 0 to 0 dl 1564419837 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 29 10:03:52 fir-md1-s1 kernel: Lustre: 97671:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 260 previous similar messages Jul 29 10:03:53 fir-md1-s1 kernel: LustreError: 23660:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) failed to reply to blocking AST (req@ffff8f15347fd400 x1636748843559712 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2f355c72c0/0x5d9ee699e55ff42d lrc: 4/0,0 mode: PR/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 619 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x1098607029d035e expref: 61 pid: 10143 timeout: 3535035 lvb_type: 0 Jul 29 10:03:53 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Jul 29 10:03:53 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f355c72c0/0x5d9ee699e55ff42d lrc: 3/0,0 mode: PR/PR res: [0x2c0000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 619 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x1098607029d035e expref: 62 pid: 10143 timeout: 0 lvb_type: 0 Jul 29 10:03:53 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:28s); client may timeout. req@ffff8f15e3099200 x1636450211843136/t0(0) o101->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:25/0 lens 608/0 e 0 to 0 dl 1564419805 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 10:03:53 fir-md1-s1 kernel: LustreError: 97641:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.12@o2ib6: deadline 30:1s ago req@ffff8f20f2b30c00 x1634932131689408/t0(0) o101->8f367c70-6bbd-359c-a9cb-016bde9e7ec3@10.8.27.12@o2ib6:22/0 lens 576/0 e 0 to 0 dl 1564419832 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Jul 29 10:03:53 fir-md1-s1 kernel: LustreError: 97641:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Jul 29 10:03:53 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 295 previous similar messages Jul 29 10:04:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dd73e4df-b09b-ed59-d0a4-c8564f1e4a97 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3336ea4800, cur 1564419878 expire 1564419728 last 1564419651 Jul 29 10:04:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 29 10:04:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:06:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:06:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 10:06:53 fir-md1-s1 kernel: Lustre: Skipped 649 previous similar messages Jul 29 10:10:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:10:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 10:10:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 10:10:46 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 10:12:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bddaf8a5-4b37-2a76-1779-67b6a3b482d3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16f8b27800, cur 1564420368 expire 1564420218 last 1564420141 Jul 29 10:12:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 10:14:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 10:14:01 fir-md1-s1 kernel: Lustre: Skipped 306 previous similar messages Jul 29 10:16:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 10:16:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 10:17:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:21:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e4f9da50-7c9f-6b70-17b0-f6f5bc26b448 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21ceac8000, cur 1564420864 expire 1564420714 last 1564420637 Jul 29 10:21:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 10:21:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 10:21:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 10:24:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 10:24:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 10:25:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:27:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 10:27:07 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 29 10:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 00cbd137-3cff-7913-29b7-eea37f4fa3db (at 10.8.26.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f315aaac000, cur 1564421509 expire 1564421359 last 1564421282 Jul 29 10:31:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 10:33:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 10:33:59 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 29 10:34:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9c248b7-fd56-0fdd-eb42-5ecb88279b8a (at 10.9.108.68@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1aa0cf1800, cur 1564421688 expire 1564421538 last 1564421461 Jul 29 10:34:48 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 10:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 10:35:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 10:37:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 10:37:12 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 10:38:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:38:25 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 29 10:39:49 fir-md1-s1 kernel: Lustre: 13135:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0c1f4e1050 x1639242309769136/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564421994 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 10:39:49 fir-md1-s1 kernel: Lustre: 13135:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 23 previous similar messages Jul 29 10:39:54 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 29 10:39:54 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jul 29 10:39:55 fir-md1-s1 kernel: Lustre: 21682:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f0c1f4e1050 x1639242309769136/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:24/0 lens 488/408 e 0 to 0 dl 1564421994 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 29 10:45:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 10:45:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 10:45:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 10:45:26 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 29 10:47:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 10:47:29 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 10:48:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 10:48:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 10:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ef0b4bf4-7e6c-f252-0aeb-4c95bbc7cc95 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f207e74b800, cur 1564422704 expire 1564422554 last 1564422477 Jul 29 10:51:44 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 29 10:51:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef0b4bf4-7e6c-f252-0aeb-4c95bbc7cc95 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dd7bec400, cur 1564422717 expire 1564422567 last 1564422490 Jul 29 10:51:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 10:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 10:55:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 10:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1d399421-d0c9-239e-26d7-5463dff97986 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f358f6e3400, cur 1564422985 expire 1564422835 last 1564422758 Jul 29 10:56:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 10:56:56 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 10:57:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 10:57:45 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 11:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23eb8be000, cur 1564423268 expire 1564423118 last 1564423041 Jul 29 11:01:08 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 29 11:01:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 11:01:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 11:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 11:06:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 29 11:07:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 11:07:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 29 11:07:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 11:07:53 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 11:08:46 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f134faac850 x1638802973457904/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:21/0 lens 488/440 e 0 to 0 dl 1564423731 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 11:12:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 11:12:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 11:16:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 11:16:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 11:17:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 11:17:44 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 29 11:17:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 11:17:55 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 11:18:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f3d67849-36ad-6531-020b-9dece16a1885 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e206b6400, cur 1564424288 expire 1564424138 last 1564424061 Jul 29 11:18:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3d67849-36ad-6531-020b-9dece16a1885 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e206b1000, cur 1564424290 expire 1564424140 last 1564424063 Jul 29 11:18:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 11:22:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 11:22:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 11:26:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 11:26:52 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 11:27:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 11:27:50 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 29 11:27:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 11:27:56 fir-md1-s1 kernel: Lustre: Skipped 120 previous similar messages Jul 29 11:33:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 11:33:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 11:37:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 11:37:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 11:37:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 11:37:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 29 11:40:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 11:40:42 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 11:42:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5c7f81b5-478b-c278-cf5f-1d3da1a35495 (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2881637800, cur 1564425772 expire 1564425622 last 1564425545 Jul 29 11:44:07 fir-md1-s1 kernel: Lustre: 21378:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564425840/real 1564425840] req@ffff8f350daf3300 x1636748876694944/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564425847 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 11:44:07 fir-md1-s1 kernel: Lustre: 21378:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jul 29 11:44:14 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 11:46:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 11:46:13 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 11:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 11:47:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 11:47:26 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 11:47:26 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 29 11:47:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0b766838-89ea-3d2e-06ca-f7727d84cf43 (at 10.8.28.8@o2ib6) Jul 29 11:47:57 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 29 11:50:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 11:50:42 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 29 11:51:19 fir-md1-s1 kernel: Lustre: 14792:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0c10641450 x1638912718360384/t0(0) o3->23dbfbee-8f3b-27e7-f711-fd69cc641360@10.9.115.10@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564426284 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 11:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 11:57:37 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 11:57:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 11:57:58 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 29 11:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf4bb820-0ffb-ab34-0e68-255b06f2e8d1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a07412c00, cur 1564426793 expire 1564426643 last 1564426566 Jul 29 11:59:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 12:00:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf4bb820-0ffb-ab34-0e68-255b06f2e8d1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16d4d50400, cur 1564426803 expire 1564426653 last 1564426576 Jul 29 12:00:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 12:00:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:00:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 12:01:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 12:01:12 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 12:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 12:08:10 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 12:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 12:08:10 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 29 12:11:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:11:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 12:13:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 12:13:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 12:18:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 12:18:11 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 29 12:18:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 12:18:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 12:21:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:21:35 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 12:26:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 12:26:11 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 12:27:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 383ec871-1dff-1901-21cf-728379261288 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16f64d9400, cur 1564428439 expire 1564428289 last 1564428212 Jul 29 12:28:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 12:28:12 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 29 12:29:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 12:29:16 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 12:31:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:31:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 12:34:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4450f95c00, cur 1564428884 expire 1564428734 last 1564428657 Jul 29 12:34:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 12:37:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 12:37:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 12:38:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 12:38:18 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 29 12:39:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 12:39:17 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 12:42:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:42:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 12:43:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 86f20642-398a-1302-4185-861dad4e0bb8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34c7ae5400, cur 1564429438 expire 1564429288 last 1564429211 Jul 29 12:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 12:49:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 12:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 12:49:23 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 12:51:27 fir-md1-s1 kernel: Lustre: 20500:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f14f4b09450 x1638892498767680/t0(0) o3->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564429892 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 12:53:12 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564429985/real 1564429985] req@ffff8f1ec7c30c00 x1636748906229552/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564429992 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 12:53:12 fir-md1-s1 kernel: Lustre: 20720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 29 12:53:18 fir-md1-s1 kernel: Lustre: 23710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564429991/real 1564429991] req@ffff8f2c153bf800 x1636748906299856/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564429998 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 12:53:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 12:53:22 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 29 12:53:50 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564430023/real 1564430023] req@ffff8f148c415700 x1636748906598752/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564430030 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 12:54:02 fir-md1-s1 kernel: Lustre: 25680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564430035/real 1564430035] req@ffff8f2f7b196300 x1636748906718384/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564430042 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 12:54:17 fir-md1-s1 kernel: Lustre: 23670:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564430050/real 1564430050] req@ffff8f3604356000 x1636748906872688/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564430057 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 12:58:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 12:58:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 12:59:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 12:59:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 12:59:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 12:59:26 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 29 13:03:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 13:03:25 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 13:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6f142ff1-a36e-3abb-e2aa-e0f2cc6d21b2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2523657000, cur 1564430640 expire 1564430490 last 1564430413 Jul 29 13:04:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 13:09:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 13:09:46 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 13:09:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 13:09:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 13:11:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 13:11:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 13:13:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 13:13:30 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 13:18:12 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f12d0d3e450 x1639242729165680/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564431497 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 13:19:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 13:19:52 fir-md1-s1 kernel: Lustre: Skipped 115 previous similar messages Jul 29 13:20:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 13:20:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 13:22:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 13:22:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 13:22:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9790ea0a-7373-6978-d4f8-e86719ae6e19 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f25df8400, cur 1564431765 expire 1564431615 last 1564431538 Jul 29 13:22:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 13:23:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 13:23:32 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 29 13:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 13:30:20 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 13:30:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 13:30:20 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 29 13:33:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 13:33:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 13:33:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 13:33:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 13:34:08 fir-md1-s1 kernel: Lustre: 59211:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f12e3e33c50 x1638958800525824/t0(0) o3->1d9bbb43-a6f6-8fcf-8416-e1652b096042@10.9.112.9@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564432453 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 13:38:07 fir-md1-s1 kernel: Lustre: 23641:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432680/real 1564432680] req@ffff8f3a71b38300 x1636748950598608/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432687 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:38:43 fir-md1-s1 kernel: Lustre: 23708:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432716/real 1564432716] req@ffff8f0dda141800 x1636748952880784/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432723 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 13:40:25 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 29 13:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 13:40:25 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 13:40:30 fir-md1-s1 kernel: Lustre: 20466:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432823/real 1564432823] req@ffff8f35d5354200 x1636748961353632/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432830 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:41:12 fir-md1-s1 kernel: Lustre: 23737:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432865/real 1564432865] req@ffff8f0e373ff200 x1636748964163168/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432872 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:41:32 fir-md1-s1 kernel: Lustre: 25676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432885/real 1564432885] req@ffff8f13d2704e00 x1636748964972912/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432892 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:42:05 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564432918/real 1564432918] req@ffff8f1ec522d700 x1636748966547216/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564432925 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 13:42:05 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 29 13:44:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 13:44:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 13:44:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 13:44:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 13:50:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 13:50:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 13:50:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 13:50:32 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 29 13:54:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 13:54:18 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 13:55:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 13:55:39 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 14:00:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 14:00:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 14:00:44 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 29 14:00:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 14:05:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 14:05:02 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 14:05:29 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0fc4868850 x1634534882999280/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:4/0 lens 488/4536 e 1 to 0 dl 1564434334 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 14:05:36 fir-md1-s1 kernel: Lustre: 22430:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f0fc4868850 x1634534882999280/t0(0) o3->bf0fab1f-ed86-800d-24d6-23f47310966d@10.9.113.8@o2ib4:4/0 lens 488/4504 e 1 to 0 dl 1564434334 ref 1 fl Complete:/0/0 rc 4096/4096 Jul 29 14:08:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 14:08:56 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 14:10:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 14:10:44 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 14:10:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 14:10:52 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 14:18:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 14:18:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 14:18:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 14:18:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 14:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 14:20:53 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 29 14:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 14:20:53 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 29 14:28:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 14:28:26 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 29 14:28:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 14:28:58 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 29 14:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 14:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 14:31:03 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 14:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 14:31:03 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 29 14:38:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 14:38:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 14:39:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 29 14:39:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 29 14:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 14:41:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 14:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 14:41:22 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 14:49:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 14:49:38 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 14:49:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bf31fc400, cur 1564436988 expire 1564436838 last 1564436761 Jul 29 14:49:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 29 14:51:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 14:51:35 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 14:51:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 14:51:35 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 29 14:54:51 fir-md1-s1 kernel: Lustre: 23699:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564437284/real 1564437284] req@ffff8f3bbe346900 x1636749168040608/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564437291 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 14:54:51 fir-md1-s1 kernel: Lustre: 23699:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 29 14:56:53 fir-md1-s1 kernel: Lustre: 50583:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564437406/real 1564437406] req@ffff8f26dac64200 x1636749169408656/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564437413 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 14:57:18 fir-md1-s1 kernel: Lustre: 23708:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564437431/real 1564437431] req@ffff8f0543e13900 x1636749169701584/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564437438 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 14:58:10 fir-md1-s1 kernel: Lustre: 23662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564437483/real 1564437483] req@ffff8f43df6e3000 x1636749170251200/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564437490 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 14:58:10 fir-md1-s1 kernel: Lustre: 23662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 29 14:59:51 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564437584/real 1564437584] req@ffff8f10d9268300 x1636749170899888/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564437591 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 29 14:59:51 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 29 14:59:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:00:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 15:00:26 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 29 15:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 15:01:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 15:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 15:01:42 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 15:04:31 fir-md1-s1 kernel: Lustre: 35230:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f12fd798050 x1638797736131216/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:6/0 lens 488/440 e 0 to 0 dl 1564437876 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 15:06:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 15:12:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 15:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 15:12:03 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 29 15:12:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 15:12:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 15:14:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:16:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 76962d9c-bd98-91e6-4550-6d14d19edf1d (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ff668c00, cur 1564438617 expire 1564438467 last 1564438390 Jul 29 15:22:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 15:22:04 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 29 15:22:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 15:22:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 15:24:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 15:24:58 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 15:25:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:25:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 15:32:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 15:32:08 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 15:32:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 15:32:39 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 15:37:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 15:37:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 15:38:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:38:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 15:42:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 15:42:26 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 29 15:43:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 15:43:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 15:48:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 15:48:26 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 29 15:52:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 15:52:26 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 29 15:52:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 15:52:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 15:53:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 15:53:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 16:02:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 16:02:09 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 16:02:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 29 16:02:30 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 29 16:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 16:03:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 16:04:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 16:04:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 16:12:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 16:12:44 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 16:12:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 16:12:44 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 29 16:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 16:15:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 16:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 16:22:57 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 16:23:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 16:23:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 16:23:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 16:23:43 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 29 16:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 16:26:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 16:33:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 16:33:01 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 16:33:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 29 16:33:46 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 16:36:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 16:36:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 16:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 16:36:36 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 16:43:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 16:43:03 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 16:46:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 16:46:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 16:51:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 16:51:46 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 16:53:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 16:53:17 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 29 16:56:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 16:56:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 16:56:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 16:56:48 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 29 16:57:20 fir-md1-s1 kernel: Lustre: 23097:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f124cea3450 x1638889193082608/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:25/0 lens 488/440 e 0 to 0 dl 1564444645 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 17:02:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:02:18 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 17:03:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 17:03:30 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 17:06:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 17:06:51 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 29 17:10:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 17:10:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 17:12:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:12:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 29 17:13:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 17:13:43 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 17:16:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 17:16:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 17:20:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 17:20:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 17:22:23 fir-md1-s1 kernel: Lustre: 14791:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f12a6efcc50 x1638803891236944/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:28/0 lens 488/440 e 1 to 0 dl 1564446148 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 17:23:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:23:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 17:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 17:23:43 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 17:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 17:27:02 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 17:30:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 17:30:33 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 29 17:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 17:33:44 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 29 17:34:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:34:07 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 29 17:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 17:37:52 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 17:42:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 17:42:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 17:44:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 17:44:04 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 17:44:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:44:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 17:48:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 17:48:28 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 29 17:52:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 17:52:53 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 17:54:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 17:54:39 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 29 17:55:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 17:55:34 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 17:58:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 17:58:59 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 18:04:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 18:04:16 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 18:04:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 18:04:44 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Jul 29 18:06:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:06:26 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 29 18:09:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 18:09:02 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 18:15:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 18:15:42 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 29 18:16:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 18:16:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 18:17:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:17:05 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 18:19:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 18:19:17 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 18:25:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 18:25:42 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 29 18:27:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:27:06 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 18:27:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 18:27:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 18:29:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 18:29:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 18:33:54 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 29 18:35:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 18:35:45 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 29 18:37:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:37:07 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 18:39:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 18:39:50 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 18:40:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 18:40:44 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 18:46:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 18:46:19 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 18:47:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f141d9e1400, cur 1564451254 expire 1564451104 last 1564451027 Jul 29 18:47:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 29 18:48:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:48:18 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 18:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 18:50:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 29 18:50:12 fir-md1-s1 kernel: Lustre: 13961:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0bf5bc7450 x1631690063298240/t0(0) o4->13889569-7ed6-b8ab-37e8-66f4333a1d7c@10.9.107.67@o2ib4:17/0 lens 488/448 e 1 to 0 dl 1564451417 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 18:50:19 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f0bf5bc7450 x1631690063298240/t355395277508(0) o4->13889569-7ed6-b8ab-37e8-66f4333a1d7c@10.9.107.67@o2ib4:17/0 lens 488/416 e 1 to 0 dl 1564451417 ref 1 fl Complete:/0/0 rc 0/0 Jul 29 18:54:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 18:54:03 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 18:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 18:56:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 18:59:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 18:59:32 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 29 19:00:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 19:00:42 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 19:00:58 fir-md1-s1 kernel: Lustre: 22429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f07f0325c50 x1639243622943680/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564452063 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 19:06:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 19:06:30 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 19:10:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 19:10:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 19:10:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 19:10:38 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 19:11:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 19:11:07 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 19:16:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 19:16:32 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 29 19:18:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f387d39d000, cur 1564453099 expire 1564452949 last 1564452872 Jul 29 19:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 19:21:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 19:21:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 29 19:21:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 19:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 19:26:34 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 29 19:28:28 fir-md1-s1 kernel: Lustre: 21484:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f07f0326c50 x1638889584664112/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564453713 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 19:29:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 19:29:29 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 19:31:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 19:31:16 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 19:32:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 19:32:09 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 19:36:12 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 19:36:13 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 19:36:13 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 29 19:36:18 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 19:36:21 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 19:36:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 941219f8-59e9-a589-ac5e-1597e63add84 (at 10.9.101.30@o2ib4) Jul 29 19:36:42 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 19:41:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 19:41:18 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 19:42:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 19:42:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 19:46:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 19:46:46 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 29 19:47:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 19:47:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 19:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 19:51:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 19:52:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 19:52:12 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 29 19:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 19:57:24 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 29 20:01:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 20:01:35 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 20:02:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:02:13 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 29 20:03:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:03:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 20:07:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 20:07:42 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 29 20:11:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 20:11:36 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 29 20:15:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:15:09 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 29 20:18:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 20:18:09 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 20:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 20:21:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 20:26:14 fir-md1-s1 kernel: Lustre: 49228:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063f850 x1638798471629936/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:19/0 lens 488/440 e 1 to 0 dl 1564457179 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 20:26:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:26:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 29 20:27:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:27:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 20:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 20:28:09 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 29 20:29:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:29:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 20:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 20:32:51 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 20:33:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:33:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 20:37:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:37:16 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 29 20:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 20:38:13 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 29 20:38:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:38:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 20:43:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 29 20:43:18 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 29 20:49:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:49:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 20:49:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 20:49:14 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 29 20:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 20:53:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 29 20:58:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 20:58:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 20:59:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 20:59:18 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 29 20:59:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 20:59:18 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 29 21:05:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 21:05:11 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 21:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f305adb6c00, cur 1564459618 expire 1564459468 last 1564459391 Jul 29 21:09:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 21:09:38 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 29 21:09:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 21:09:38 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 29 21:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 21:15:23 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 29 21:18:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 21:18:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 21:19:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 29 21:19:47 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 29 21:20:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 21:20:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 29 21:22:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d1e30a800, cur 1564460556 expire 1564460406 last 1564460329 Jul 29 21:24:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 21:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 21:25:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 29 21:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 21:30:06 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 29 21:30:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 21:30:20 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 29 21:33:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 21:33:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 21:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 21:35:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 21:40:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 21:40:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 21:40:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 21:40:57 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 29 21:40:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 21:40:57 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 29 21:46:31 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Jul 29 21:46:31 fir-md1-s1 kernel: LNetError: 20180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Jul 29 21:46:31 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f38a9b72200 Jul 29 21:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7eae5f9-18e9-99eb-0207-24a1fdf92451 (at 10.9.113.2@o2ib4) reconnecting Jul 29 21:46:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 29 21:46:33 fir-md1-s1 kernel: LustreError: 48201:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34157a4c50 x1636354872663600/t0(0) o3->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564462006 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ca693efe-e963-3124-a59d-0beac55f4de3 (at 10.9.112.17@o2ib4), client will retry: rc -110 Jul 29 21:46:33 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 29 21:46:33 fir-md1-s1 kernel: LustreError: 48201:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 9 previous similar messages Jul 29 21:46:34 fir-md1-s1 kernel: LustreError: 52409:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f09597f7c50 x1638804503287664/t0(0) o3->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564462006 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:34 fir-md1-s1 kernel: LustreError: 52409:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 13 previous similar messages Jul 29 21:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with a8495761-7359-3610-2479-b4da362523dd (at 10.9.101.31@o2ib4), client will retry: rc = -110 Jul 29 21:46:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 29 21:46:36 fir-md1-s1 kernel: LustreError: 49470:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34157a4850 x1638889990079520/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564462006 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:36 fir-md1-s1 kernel: LustreError: 49470:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 29 21:46:41 fir-md1-s1 kernel: Lustre: 21541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19031d6c50 x1631641742874480/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564462006 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:43 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0714f04050 x1638768960898544/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564462022 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:43 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 29 21:46:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 29 21:46:43 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 21:46:46 fir-md1-s1 kernel: Lustre: 46582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4233638450 x1631708146044080/t0(0) o3->2d384d58-fd4c-f6d6-342b-6f9f296484e1@10.9.101.46@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564462011 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:46 fir-md1-s1 kernel: Lustre: 46582:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 29 21:46:51 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f0a6063ac50 x1631543658446912/t0(0) o4->75a42419-1c36-3d84-69b0-0982bb5ad919@10.9.101.63@o2ib4:21/0 lens 504/448 e 1 to 0 dl 1564462011 ref 1 fl Interpret:/0/0 rc 0/0 Jul 29 21:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0fafc81c-d2f9-5fcc-1c5e-9d205df82025 (at 10.9.104.20@o2ib4), client will retry: rc = -110 Jul 29 21:46:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 29 21:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 2d384d58-fd4c-f6d6-342b-6f9f296484e1 (at 10.9.101.46@o2ib4), client will retry: rc -110 Jul 29 21:46:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 29 21:46:51 fir-md1-s1 kernel: LustreError: 21711:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Jul 29 21:50:12 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0714f05050 x1638889998561872/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564462217 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 21:50:12 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 29 21:51:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 21:51:06 fir-md1-s1 kernel: Lustre: Skipped 233 previous similar messages Jul 29 21:55:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 21:55:27 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 29 21:56:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 21:56:35 fir-md1-s1 kernel: Lustre: Skipped 156 previous similar messages Jul 29 21:58:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 21:58:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 29 22:01:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 22:01:12 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 22:05:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 22:05:29 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 29 22:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 22:06:45 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 29 22:08:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 22:08:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 22:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 22:11:15 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 29 22:16:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 22:16:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 22:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 22:17:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 29 22:19:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 22:19:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 29 22:21:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 22:21:16 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 29 22:26:18 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 29 22:27:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 22:27:32 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 29 22:28:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 22:28:10 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 29 22:29:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 22:29:16 fir-md1-s1 kernel: LustreError: Skipped 11 previous similar messages Jul 29 22:31:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 22:31:16 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 29 22:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2888c49000, cur 1564465008 expire 1564464858 last 1564464781 Jul 29 22:38:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 22:38:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 22:38:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 22:38:12 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 22:41:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 22:41:16 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 29 22:44:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 22:44:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 22:48:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 22:48:13 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 29 22:48:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 22:48:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 29 22:51:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 22:51:17 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 29 22:55:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 22:55:46 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 29 22:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 22:58:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 22:59:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 22:59:18 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 23:01:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 23:01:20 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 29 23:05:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 23:05:57 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 29 23:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 29 23:08:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 29 23:10:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 29 23:10:56 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 29 23:11:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 23:11:21 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 29 23:16:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 23:16:22 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 23:18:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 29 23:18:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 23:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 23:21:41 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 29 23:22:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 23:22:08 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 29 23:28:10 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 29 23:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 29 23:28:58 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 29 23:30:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 23:30:44 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 29 23:31:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 29 23:31:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 29 23:32:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 23:32:14 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 29 23:39:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 23:39:27 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 29 23:41:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 29 23:41:51 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 29 23:42:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 29 23:42:21 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 29 23:42:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 23:42:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 29 23:49:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 29 23:49:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 29 23:49:51 fir-md1-s1 kernel: Lustre: 49251:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063ac50 x1639159531805680/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:26/0 lens 488/4536 e 1 to 0 dl 1564469396 ref 2 fl Interpret:/0/0 rc 0/0 Jul 29 23:49:59 fir-md1-s1 kernel: Lustre: 52409:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f0a6063ac50 x1639159531805680/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:26/0 lens 488/4504 e 1 to 0 dl 1564469396 ref 1 fl Complete:/0/0 rc 4096/4096 Jul 29 23:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 29 23:51:56 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 29 23:52:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 29 23:52:28 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 29 23:54:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 29 23:54:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 00:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 00:00:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 00:02:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 00:02:11 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 30 00:02:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 00:02:46 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 30 00:06:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 00:06:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 00:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 00:10:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 30 00:12:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 00:12:14 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 30 00:12:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 00:12:52 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 30 00:18:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 00:18:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 00:20:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 00:20:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 00:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 00:22:35 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 30 00:24:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 30 00:24:22 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 00:30:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 00:30:28 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 00:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 00:30:38 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 30 00:32:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 00:32:53 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 30 00:34:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 00:34:46 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 30 00:40:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 00:40:53 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 00:41:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 00:41:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 00:43:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 00:43:04 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 30 00:45:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 00:45:18 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 30 00:48:39 fir-md1-s1 kernel: Lustre: 81716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063e050 x1638890036988064/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564472924 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 00:51:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 00:51:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 00:53:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 00:53:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 00:53:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 00:53:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 00:55:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 30 00:55:54 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 01:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 01:01:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 01:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 01:03:04 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 30 01:04:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 01:04:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 01:06:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 30 01:06:06 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 01:06:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e1a6eec00, cur 1564474018 expire 1564473868 last 1564473791 Jul 30 01:11:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 01:11:07 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 01:13:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 01:13:20 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 30 01:15:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 01:15:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 01:18:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 30 01:18:16 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 30 01:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 01:21:36 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 01:23:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 01:23:35 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 01:29:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 01:29:41 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 30 01:32:05 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f2664938900 x1638908657216176/t0(0) o103->7902ac63-155c-5c64-1b94-de807e6dff37@10.8.8.22@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 01:32:05 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2 previous similar messages Jul 30 01:32:06 fir-md1-s1 kernel: Lustre: 23563:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564475519/real 1564475525] req@ffff8f0bda684500 x1636750313621296/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564475526 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 01:32:06 fir-md1-s1 kernel: Lustre: 23563:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jul 30 01:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 01:32:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 01:32:15 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 01:33:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 01:33:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 01:34:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 01:34:13 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 30 01:39:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 01:39:42 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 30 01:42:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 01:42:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 01:44:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 01:44:15 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 30 01:45:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 01:45:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 01:49:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 30 01:49:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 30 01:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 01:52:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 01:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 01:54:17 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 01:56:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3963ce1800, cur 1564477012 expire 1564476862 last 1564476785 Jul 30 01:58:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 01:58:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 02:00:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 02:00:27 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 30 02:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 02:03:04 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 02:04:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:04:20 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 30 02:12:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 02:12:05 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 02:13:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 02:13:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 02:14:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:14:22 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 02:19:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 02:19:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 02:22:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 02:22:46 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 30 02:23:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 02:23:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 02:24:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:24:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 02:24:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 02:24:26 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Jul 30 02:25:57 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f0a278a4500 x1638873637544784/t0(0) o103->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 02:25:57 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f2c87b40f00 x1631734947813888/t0(0) o103->1b613684-4823-fcc7-0f6e-9ca11e50b913@10.9.106.30@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 02:25:57 fir-md1-s1 kernel: LustreError: 49466:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f2634c64850 x1633757180159232/t0(0) o3->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:27/0 lens 488/440 e 0 to 0 dl 1564478757 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 02:25:57 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 02:25:57 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ef5131a00 Jul 30 02:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc -110 Jul 30 02:25:57 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 4 previous similar messages Jul 30 02:32:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 02:32:59 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 30 02:33:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 02:33:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 02:35:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:35:11 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 30 02:40:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 02:40:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 02:41:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 02:43:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 02:43:30 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 02:43:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 02:43:45 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 02:45:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:45:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 02:51:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 02:53:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 02:53:40 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 30 02:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 02:53:47 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 30 02:55:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 02:55:26 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 30 03:01:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 03:03:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 03:03:42 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 30 03:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 03:03:52 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 03:05:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 03:05:37 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 03:06:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 03:06:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 03:07:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 30 03:07:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages Jul 30 03:07:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 7, oc: 0, rc: 8 Jul 30 03:07:32 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 25633:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.28.1@o2ib6: deadline 6:1s ago req@ffff8f201cdb4050 x1638886530736128/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:2/0 lens 488/0 e 0 to 0 dl 1564481252 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 25633:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 46537:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=13 reqQ=0 recA=13, svcEst=1, delay=5784 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 25633:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f201cdb4050 x1638886530736128/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:2/0 lens 488/0 e 0 to 0 dl 1564481252 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 46563:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f21682ff050 x1638886530736224/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:2/0 lens 488/0 e 0 to 0 dl 1564481252 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 25633:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 20700:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f24db833c00 x1631637846871168/t0(0) o103->339627b1-f298-e293-3cc1-dc6c48f43358@10.9.104.56@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 46563:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 20700:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 23710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564481246/real 1564481246] req@ffff8f25c877f200 x1636750364970528/t0(0) o106->fir-MDT0002@10.9.104.56@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1564481253 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 20700:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f24db833c00 x1631637846871168/t0(0) o103->339627b1-f298-e293-3cc1-dc6c48f43358@10.9.104.56@o2ib4:2/0 lens 328/0 e 0 to 0 dl 1564481252 ref 2 fl New:/0/ffffffff rc 0/-1 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: 20700:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 46524:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f1b22616450 x1638873731011264/t0(0) o3->e3c32682-5f6c-0001-d03b-79e797f51faf@10.9.115.5@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564481252 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 46524:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2371e11400 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f226556ca00 Jul 30 03:07:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 2491 seconds Jul 30 03:07:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 20 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 23036:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f4425bb1450 x1639159900303136/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564481252 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 23036:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2aed052850 x1639159900303616/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564481267 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 16 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LNetError: 21540:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.8@o2ib6 from 10.0.10.51@o2ib7 Jul 30 03:07:34 fir-md1-s1 kernel: LNetError: 21540:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 14 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 21540:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f21198d1c00 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 49465:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f29f3899e00 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f088ce2b600 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3fcccbdc00 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: fir-OST002e-osc-MDT0002: Connection to fir-OST002e (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 339627b1-f298-e293-3cc1-dc6c48f43358 (at 10.9.104.56@o2ib4), client will retry: rc = -110 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0a6063d050 x1634140048828464/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564481266 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 35232:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0a6063e050 x1634140048829824/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564481267 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: LustreError: 35232:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 03:07:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 03:07:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 1b613684-4823-fcc7-0f6e-9ca11e50b913 (at 10.9.106.30@o2ib4), client will retry: rc = -110 Jul 30 03:07:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 30 03:07:37 fir-md1-s1 kernel: LustreError: 21566:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 5+5s req@ffff8f36b61f2850 x1638769258963008/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564481252 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:37 fir-md1-s1 kernel: LustreError: 21566:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 13 previous similar messages Jul 30 03:07:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 30 03:07:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 03:07:37 fir-md1-s1 kernel: Lustre: 21566:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f36b61f2850 x1638769258963008/t0(0) o3->524f09b9-37f3-6401-947e-a803ba6b2d1e@10.9.114.5@o2ib4:2/0 lens 488/440 e 0 to 0 dl 1564481252 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 30 03:07:37 fir-md1-s1 kernel: Lustre: 21566:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 28 previous similar messages Jul 30 03:07:39 fir-md1-s1 kernel: Lustre: 20208:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564481251/real 1564481253] req@ffff8f0e3c128c00 x1636750364971424/t0(0) o13->fir-OST0028-osc-MDT0000@10.0.10.107@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564481258 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 03:07:39 fir-md1-s1 kernel: Lustre: 20208:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 36 previous similar messages Jul 30 03:07:39 fir-md1-s1 kernel: Lustre: fir-OST0028-osc-MDT0000: Connection to fir-OST0028 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 03:07:40 fir-md1-s1 kernel: LustreError: 48200:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2b3a300c50 x1634140048829824/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564481273 ref 1 fl Interpret:/2/0 rc 0/0 Jul 30 03:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 89c5b213-fa16-71ad-d5f3-58d49989ce10 (at 10.9.115.11@o2ib4), client will retry: rc -110 Jul 30 03:07:40 fir-md1-s1 kernel: LustreError: 48200:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Jul 30 03:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 339627b1-f298-e293-3cc1-dc6c48f43358 (at 10.9.104.56@o2ib4), client will retry: rc = -110 Jul 30 03:07:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 03:07:41 fir-md1-s1 kernel: Lustre: 49227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a60639050 x1639199011539584/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564481266 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:41 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a6d3aa050 x1631642865061440/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564481266 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:41 fir-md1-s1 kernel: Lustre: 46518:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 30 03:07:43 fir-md1-s1 kernel: Lustre: 21566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3701fc8450 x1631642865062320/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564481267 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:43 fir-md1-s1 kernel: Lustre: 21566:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 30 03:07:46 fir-md1-s1 kernel: LustreError: 21289:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 13+0s req@ffff8f0a60639050 x1639199011539584/t0(0) o3->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:16/0 lens 488/440 e 1 to 0 dl 1564481266 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 524f09b9-37f3-6401-947e-a803ba6b2d1e (at 10.9.114.5@o2ib4), client will retry: rc -110 Jul 30 03:07:46 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 30 03:07:46 fir-md1-s1 kernel: LustreError: 21289:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 30 03:07:48 fir-md1-s1 kernel: Lustre: 49465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2baa3ebc50 x1639199011539952/t0(0) o4->904bd105-fefc-cbe7-ee1c-8f3381873cf6@10.9.113.1@o2ib4:23/0 lens 488/448 e 1 to 0 dl 1564481273 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:52 fir-md1-s1 kernel: Lustre: 97659:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1daf32a400 x1634178305716608/t0(0) o101->e4809e9d-cd93-fce4-b050-67f299926009@10.9.101.67@o2ib4:27/0 lens 1792/3288 e 1 to 0 dl 1564481277 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:52 fir-md1-s1 kernel: Lustre: 97659:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 30 03:07:53 fir-md1-s1 kernel: LustreError: 13960:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f0be2300050 x1638890050694592/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:23/0 lens 488/440 e 1 to 0 dl 1564481273 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 904bd105-fefc-cbe7-ee1c-8f3381873cf6 (at 10.9.113.1@o2ib4), client will retry: rc = -110 Jul 30 03:07:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:07:53 fir-md1-s1 kernel: LustreError: 13960:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 30 03:08:02 fir-md1-s1 kernel: Lustre: 23608:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f342baf9500 x1634297647814944/t0(0) o101->9081d826-2f83-5b46-ff73-7e6473184838@10.8.17.25@o2ib6:7/0 lens 600/3264 e 0 to 0 dl 1564481287 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:08:02 fir-md1-s1 kernel: Lustre: 23608:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 30 03:08:02 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.112.12@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f20f899bcc0/0x5d9ee69da1314fb8 lrc: 3/0,0 mode: PR/PR res: [0x200007b2b:0x3fd7:0x0].0x0 bits 0x58/0x0 rrc: 3 type: IBT flags: 0x60200400010020 nid: 10.9.112.12@o2ib4 remote: 0xd026eea09f76924a expref: 8235 pid: 97663 timeout: 3596342 lvb_type: 0 Jul 30 03:08:03 fir-md1-s1 kernel: LustreError: 48203:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f34dadda050 x1638798896491136/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:3/0 lens 488/440 e 0 to 0 dl 1564481283 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:08:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 904bd105-fefc-cbe7-ee1c-8f3381873cf6 (at 10.9.113.1@o2ib4), client will retry: rc -110 Jul 30 03:08:03 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 30 03:08:03 fir-md1-s1 kernel: LustreError: 48203:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 30 03:08:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f165680de80/0x5d9ee69db2c3330f lrc: 3/0,0 mode: PR/PR res: [0x2c002c595:0x1fb6c:0x0].0x0 bits 0x1b/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.8.22@o2ib6 remote: 0x475ff4c7a97b725f expref: 8158 pid: 21459 timeout: 3596344 lvb_type: 0 Jul 30 03:08:04 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 03:08:04 fir-md1-s1 kernel: LustreError: 10143:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ebcf5b600 x1636750365098880/t0(0) o104->fir-MDT0002@10.8.8.22@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 30 03:08:04 fir-md1-s1 kernel: LustreError: 10143:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Jul 30 03:08:04 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 30 03:14:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 03:14:12 fir-md1-s1 kernel: Lustre: Skipped 446 previous similar messages Jul 30 03:14:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 03:14:33 fir-md1-s1 kernel: Lustre: Skipped 137 previous similar messages Jul 30 03:15:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 03:15:38 fir-md1-s1 kernel: Lustre: Skipped 636 previous similar messages Jul 30 03:21:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 03:21:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 03:24:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 03:24:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 30 03:25:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 03:25:44 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 03:25:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 03:25:51 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 03:35:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 03:35:07 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 03:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 03:36:01 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 30 03:36:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 03:36:54 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 30 03:39:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 03:42:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 30 03:42:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 6 previous similar messages Jul 30 03:42:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.101@o2ib7 (5): c: 0, oc: 0, rc: 8 Jul 30 03:42:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 6 previous similar messages Jul 30 03:42:58 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 30 03:42:58 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 5 previous similar messages Jul 30 03:42:58 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.104@o2ib7 (6): c: 1, oc: 0, rc: 8 Jul 30 03:42:58 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 5 previous similar messages Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: 46519:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f2cae718450 x1638886578350768/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: 20208:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1564483372/real 1564483378] req@ffff8f0dbf910f00 x1636750381279920/t0(0) o13->fir-OST000f-osc-MDT0002@10.0.10.104@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1564483379 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: 20208:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: fir-OST000f-osc-MDT0002: Connection to fir-OST000f (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 30 03:42:58 fir-md1-s1 kernel: LustreError: 46545:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f3656ac2050 x1639159985572560/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:28/0 lens 488/440 e 0 to 0 dl 1564483378 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:42:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.108@o2ib7: accepting Jul 30 03:42:58 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Jul 30 03:42:58 fir-md1-s1 kernel: LustreError: 46561:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check Jul 30 03:42:58 fir-md1-s1 kernel: LustreError: 46561:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.8.20.35@o2ib6 x1639195788519120 Jul 30 03:42:58 fir-md1-s1 kernel: Lustre: 46519:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 10 previous similar messages Jul 30 03:42:59 fir-md1-s1 kernel: LustreError: 44036:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1b22616450 x1631571227452752/t0(0) o4->9a7d7178-90e8-1693-f97d-03806a59f3b6@10.8.26.36@o2ib6:4/0 lens 504/448 e 0 to 0 dl 1564483384 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:42:59 fir-md1-s1 kernel: LustreError: 44036:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 31 previous similar messages Jul 30 03:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 9a7d7178-90e8-1693-f97d-03806a59f3b6 (at 10.8.26.36@o2ib6), client will retry: rc = -110 Jul 30 03:42:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:43:00 fir-md1-s1 kernel: LustreError: 46568:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1c93f1e050 x1638888789335312/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:12/0 lens 488/440 e 1 to 0 dl 1564483392 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 8df94149-5690-262d-f805-cc7898f99b40 (at 10.8.16.5@o2ib6), client will retry: rc -110 Jul 30 03:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 30 03:43:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 03:43:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 03:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with d3c03fa2-3e41-4741-cf2d-21c94adb10e5 (at 10.9.108.40@o2ib4), client will retry: rc = -110 Jul 30 03:43:00 fir-md1-s1 kernel: LustreError: 46568:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 7 previous similar messages Jul 30 03:43:02 fir-md1-s1 kernel: LustreError: 35233:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f36b61f2850 x1634178285404688/t0(0) o4->4eb33ecd-a5f0-193d-5f26-5af6c5e43062@10.9.109.68@o2ib4:28/0 lens 488/448 e 0 to 0 dl 1564483408 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:02 fir-md1-s1 kernel: LustreError: 55159:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f423363cc50 x1634178285404720/t0(0) o4->4eb33ecd-a5f0-193d-5f26-5af6c5e43062@10.9.109.68@o2ib4:28/0 lens 488/448 e 0 to 0 dl 1564483408 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 4eb33ecd-a5f0-193d-5f26-5af6c5e43062 (at 10.9.109.68@o2ib4), client will retry: rc = -110 Jul 30 03:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 1b613684-4823-fcc7-0f6e-9ca11e50b913 (at 10.9.106.30@o2ib4), client will retry: rc = -110 Jul 30 03:43:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:43:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 30 03:43:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 03:43:09 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063c850 x1631550741169088/t0(0) o4->a6e937d6-4ce3-9563-d22c-a60e645be4a6@10.9.108.6@o2ib4:13/0 lens 504/448 e 1 to 0 dl 1564483393 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:09 fir-md1-s1 kernel: Lustre: 22428:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 30 03:43:13 fir-md1-s1 kernel: LustreError: 13961:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f0a6063c850 x1631550741169088/t0(0) o4->a6e937d6-4ce3-9563-d22c-a60e645be4a6@10.9.108.6@o2ib4:13/0 lens 504/448 e 1 to 0 dl 1564483393 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a6e937d6-4ce3-9563-d22c-a60e645be4a6 (at 10.9.108.6@o2ib4), client will retry: rc = -110 Jul 30 03:43:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 03:43:13 fir-md1-s1 kernel: Lustre: 35233:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3d442cd050 x1636450290212304/t0(0) o4->f3b73f80-5edd-b2a2-a7a2-f0eb0f74bb77@10.9.102.47@o2ib4:18/0 lens 488/448 e 1 to 0 dl 1564483398 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:13 fir-md1-s1 kernel: Lustre: 35233:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 30 03:43:18 fir-md1-s1 kernel: LustreError: 46537:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f3d442cd050 x1636450290212304/t0(0) o4->f3b73f80-5edd-b2a2-a7a2-f0eb0f74bb77@10.9.102.47@o2ib4:18/0 lens 488/448 e 1 to 0 dl 1564483398 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 03:43:18 fir-md1-s1 kernel: LustreError: 46537:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 30 03:45:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 03:45:18 fir-md1-s1 kernel: Lustre: Skipped 497 previous similar messages Jul 30 03:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 03:46:01 fir-md1-s1 kernel: Lustre: Skipped 784 previous similar messages Jul 30 03:46:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.34@o2ib6, removing former export from same NID Jul 30 03:46:58 fir-md1-s1 kernel: Lustre: Skipped 250 previous similar messages Jul 30 03:51:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 03:51:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 03:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 03:55:34 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 03:56:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 03:56:31 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 03:57:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 03:57:35 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 04:02:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 04:02:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 04:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 04:05:44 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 30 04:06:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 04:06:34 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 04:10:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 04:10:48 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 30 04:13:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 04:13:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 04:15:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 04:15:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 04:16:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 04:16:52 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 30 04:21:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 04:21:43 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f1819567850 x1634140202050768/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=30 reqQ=0 recA=30, svcEst=1, delay=6411 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 2 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 27583:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.11@o2ib4: deadline 6:1s ago req@ffff8f1819567850 x1634140202050768/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:17/0 lens 488/0 e 0 to 0 dl 1564485917 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 27583:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1819567850 x1634140202050768/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:17/0 lens 488/0 e 0 to 0 dl 1564485917 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 46552:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 27583:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f1819567850 x1634140202050768/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:17/0 lens 488/0 e 0 to 0 dl 1564485917 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 27482:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after -1+1s req@ffff8f237a8b9050 x1635717560085440/t0(0) o4->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:17/0 lens 488/448 e 0 to 0 dl 1564485917 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1564485911/real 1564485918] req@ffff8f3debbe1200 x1636750399356864/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 1 dl 1564485918 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with bf0fab1f-ed86-800d-24d6-23f47310966d (at 10.9.113.8@o2ib4), client will retry: rc -110 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 23589:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 40 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 25633:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f20c5306850 x1631643159754912/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:17/0 lens 488/440 e 0 to 0 dl 1564485917 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 25633:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 3 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 5, rc: 6 Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds Jul 30 04:25:18 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f253450b000 Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 46562:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.16.5@o2ib6 from 10.0.10.51@o2ib7 Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 46562:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 1 previous similar message Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 46562:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f237b899200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2c2c2bf200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 27581:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f24420e5400 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f253450c000 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e9a306200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f2c2c2bee00 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20194:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3620b82200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3338e66800 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f288f87b400 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f1c733fd800 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 24563:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f386d11a800 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 46580:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f33b6259c00 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 46526:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f33b6259200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 22157:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f33b6259400 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 69438:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8f226556dc00 Jul 30 04:25:18 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: connected Jul 30 04:25:18 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f168b4cc000 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33b625fe00 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2534509000 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0d8838b600 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 9dcf2f2b-339d-b96d-0792-e79b27f28969 (at 10.8.28.2@o2ib6), client will retry: rc = -110 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33b625f800 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f0d8838ec00 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33b6258200 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 49474:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f27d0a2a450 x1631643159745456/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:0/0 lens 488/440 e 1 to 0 dl 1564485930 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 49474:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=7, delay=6241 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 4 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1ffe7b3900 x1639990788274128/t0(0) o37->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:17/0 lens 448/408 e 0 to 0 dl 1564485917 ref 1 fl Complete:/0/0 rc 0/0 Jul 30 04:25:18 fir-md1-s1 kernel: Lustre: 21864:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages Jul 30 04:25:18 fir-md1-s1 kernel: LustreError: 21685:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3656ac5050 x1638890058039888/t0(0) o4->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:5/0 lens 488/448 e 1 to 0 dl 1564485935 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ca693efe-e963-3124-a59d-0beac55f4de3 (at 10.9.112.17@o2ib4), client will retry: rc -110 Jul 30 04:25:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 04:25:19 fir-md1-s1 kernel: Lustre: 21424:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564485912/real 1564485912] req@ffff8f3db5e4bf00 x1636750399357696/t0(0) o104->fir-MDT0000@10.9.101.46@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564485919 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 04:25:19 fir-md1-s1 kernel: Lustre: 21424:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 30 04:25:20 fir-md1-s1 kernel: LustreError: 14792:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f0714f05c50 x1633705800746448/t0(0) o4->ce25d0e1-042f-8e04-e899-a91b78d4bc2b@10.9.102.61@o2ib4:2/0 lens 504/448 e 1 to 0 dl 1564485932 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with d3c03fa2-3e41-4741-cf2d-21c94adb10e5 (at 10.9.108.40@o2ib4), client will retry: rc = -110 Jul 30 04:25:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 04:25:20 fir-md1-s1 kernel: LustreError: 14792:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 13 previous similar messages Jul 30 04:25:22 fir-md1-s1 kernel: LustreError: 56756:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3c3e2ba850 x1631550976701488/t0(0) o4->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:5/0 lens 488/448 e 1 to 0 dl 1564485935 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:22 fir-md1-s1 kernel: LustreError: 56756:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 30 04:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4), client will retry: rc = -110 Jul 30 04:25:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 30 04:25:23 fir-md1-s1 kernel: LustreError: 21541:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 6+6s req@ffff8f21f6ac8850 x1631593847625472/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564485917 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:23 fir-md1-s1 kernel: LustreError: 21541:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 14 previous similar messages Jul 30 04:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 8df94149-5690-262d-f805-cc7898f99b40 (at 10.8.16.5@o2ib6), client will retry: rc -110 Jul 30 04:25:23 fir-md1-s1 kernel: Lustre: 21390:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:6s); client may timeout. req@ffff8f19f3be6850 x1638871106221008/t0(0) o3->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1564485917 ref 1 fl Complete:/0/ffffffff rc -110/-1 Jul 30 04:25:23 fir-md1-s1 kernel: Lustre: 21390:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 32 previous similar messages Jul 30 04:25:25 fir-md1-s1 kernel: Lustre: 23715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564485918/real 1564485918] req@ffff8f3694823900 x1636750399356880/t0(0) o104->fir-MDT0000@10.9.101.25@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564485925 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 30 04:25:25 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2cae718450 x1638871106220512/t0(0) o3->8df94149-5690-262d-f805-cc7898f99b40@10.8.16.5@o2ib6:0/0 lens 488/440 e 1 to 0 dl 1564485930 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:26 fir-md1-s1 kernel: Lustre: 21711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063bc50 x1631543661627744/t0(0) o4->75a42419-1c36-3d84-69b0-0982bb5ad919@10.9.101.63@o2ib4:1/0 lens 504/448 e 1 to 0 dl 1564485931 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:26 fir-md1-s1 kernel: Lustre: 21711:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Jul 30 04:25:28 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1310412050 x1631565524292016/t0(0) o4->30b46016-5e0f-ddd3-494b-68e306e1f0e9@10.9.104.30@o2ib4:3/0 lens 488/448 e 1 to 0 dl 1564485933 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:28 fir-md1-s1 kernel: Lustre: 21709:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 30 04:25:30 fir-md1-s1 kernel: LustreError: 21037:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f2aed057050 x1638887005782880/t0(0) o3->666b60d6-ed92-c98b-c78c-4bfc3f3e7231@10.8.16.2@o2ib6:0/0 lens 488/440 e 1 to 0 dl 1564485930 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:30 fir-md1-s1 kernel: LustreError: 49472:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 19+0s req@ffff8f2971291450 x1631593847624464/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:0/0 lens 488/440 e 1 to 0 dl 1564485930 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 30 04:25:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 04:25:30 fir-md1-s1 kernel: LustreError: 21037:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 30 04:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4), client will retry: rc = -110 Jul 30 04:25:31 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 30 04:25:33 fir-md1-s1 kernel: Lustre: 35237:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6063b450 x1631544172144448/t0(0) o4->59ed9195-3053-54a1-9f0d-fc01c085e1aa@10.9.105.39@o2ib4:8/0 lens 488/448 e 1 to 0 dl 1564485938 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:38 fir-md1-s1 kernel: LustreError: 46526:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1b2f6ee450 x1634982859912016/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:8/0 lens 488/440 e 1 to 0 dl 1564485938 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:25:38 fir-md1-s1 kernel: LustreError: 46526:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 10 previous similar messages Jul 30 04:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 30 04:25:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Jul 30 04:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting Jul 30 04:25:46 fir-md1-s1 kernel: Lustre: Skipped 461 previous similar messages Jul 30 04:27:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 04:27:00 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Jul 30 04:31:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 04:31:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 04:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 04:32:39 fir-md1-s1 kernel: Lustre: Skipped 295 previous similar messages Jul 30 04:34:56 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 30 04:34:56 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 30 04:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 04:35:46 fir-md1-s1 kernel: Lustre: Skipped 136 previous similar messages Jul 30 04:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 04:37:21 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 30 04:43:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 04:43:17 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 04:46:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 04:46:08 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 30 04:47:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 04:47:22 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 30 04:48:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 04:54:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 04:54:01 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 04:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 04:56:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 04:57:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 04:57:39 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 30 04:59:53 fir-md1-s1 kernel: Lustre: 27064:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f1f618a2850 x1637987041193312/t0(0) o35->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 30 04:59:53 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 30 04:59:53 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (6): c: 6, oc: 0, rc: 7 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 46518:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f2e6a94b850 x1639160224312320/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1564487993 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 04:59:53 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 04:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with c1bbe4f4-a78a-a916-da69-f738d5b89f92 (at 10.9.114.7@o2ib4), client will retry: rc -110 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20186:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3338e66400 Jul 30 04:59:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 2070 seconds Jul 30 04:59:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 1 previous similar message Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0d88389600 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f324478a800 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1c733fe600 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f4164e2b200 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0ecd5a9400 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f34d8c07600 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f348e453800 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3912332e00 Jul 30 04:59:53 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f3964d6d400 Jul 30 04:59:53 fir-md1-s1 kernel: Lustre: 27064:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 30 04:59:54 fir-md1-s1 kernel: LustreError: 49251:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f126e27f050 x1634140275905648/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564488007 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:59:54 fir-md1-s1 kernel: LustreError: 49251:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 27 previous similar messages Jul 30 04:59:55 fir-md1-s1 kernel: LustreError: 21040:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f26b8f34850 x1638799049558752/t0(0) o3->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:7/0 lens 488/440 e 1 to 0 dl 1564488007 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:59:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 904bd105-fefc-cbe7-ee1c-8f3381873cf6 (at 10.9.113.1@o2ib4), client will retry: rc -110 Jul 30 04:59:55 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 30 04:59:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 904bd105-fefc-cbe7-ee1c-8f3381873cf6 (at 10.9.113.1@o2ib4), client will retry: rc = -110 Jul 30 04:59:55 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 30 04:59:55 fir-md1-s1 kernel: LustreError: 21040:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 30 previous similar messages Jul 30 04:59:56 fir-md1-s1 kernel: LustreError: 24070:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f06d4a13850 x1636578143617728/t0(0) o4->86fa2497-cbd1-3103-4628-e12187b558d9@10.9.101.25@o2ib4:8/0 lens 488/448 e 1 to 0 dl 1564488008 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:59:56 fir-md1-s1 kernel: LustreError: 24070:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages Jul 30 04:59:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 86fa2497-cbd1-3103-4628-e12187b558d9 (at 10.9.101.25@o2ib4), client will retry: rc = -110 Jul 30 04:59:56 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 30 04:59:58 fir-md1-s1 kernel: LustreError: 46554:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f22598c1050 x1631683986157184/t0(0) o4->88ec999f-c6f4-0281-c377-b70d1594553b@10.8.12.29@o2ib6:13/0 lens 504/448 e 1 to 0 dl 1564488013 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 04:59:58 fir-md1-s1 kernel: LustreError: 46554:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 30 04:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 88ec999f-c6f4-0281-c377-b70d1594553b (at 10.8.12.29@o2ib6), client will retry: rc = -110 Jul 30 04:59:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 05:00:01 fir-md1-s1 kernel: Lustre: 22285:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564487994/real 1564487994] req@ffff8f19ff035400 x1636750415264992/t0(0) o104->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564488001 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 05:00:02 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3005cb0c50 x1634982913454032/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564488007 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 05:00:02 fir-md1-s1 kernel: Lustre: 24570:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 30 05:00:02 fir-md1-s1 kernel: Lustre: 22649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f24f474f050 x1634982913453872/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564488007 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 05:00:02 fir-md1-s1 kernel: Lustre: 22649:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 30 05:00:05 fir-md1-s1 kernel: LustreError: 46519:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3005cb0c50 x1634982913454032/t0(0) o3->12e474d9-b4d9-2c7f-2e45-e7d8f457f930@10.8.16.8@o2ib6:7/0 lens 488/440 e 1 to 0 dl 1564488007 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 05:00:05 fir-md1-s1 kernel: LustreError: 46519:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 30 05:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 12e474d9-b4d9-2c7f-2e45-e7d8f457f930 (at 10.8.16.8@o2ib6), client will retry: rc -110 Jul 30 05:00:05 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 05:00:08 fir-md1-s1 kernel: Lustre: 29829:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e6a94a450 x1631547047341792/t0(0) o4->efe78663-94f1-74d7-2a31-a7f3a5a9cd60@10.9.104.2@o2ib4:13/0 lens 504/448 e 1 to 0 dl 1564488013 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 05:00:13 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2201253850 x1636344511904320/t0(0) o3->401116e7-2bba-1e71-6be4-4599d07f8edd@10.8.18.14@o2ib6:13/0 lens 488/440 e 1 to 0 dl 1564488013 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 05:00:13 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Jul 30 05:00:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with efe78663-94f1-74d7-2a31-a7f3a5a9cd60 (at 10.9.104.2@o2ib4), client will retry: rc = -110 Jul 30 05:00:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 30 05:04:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 05:04:34 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 30 05:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 05:06:29 fir-md1-s1 kernel: Lustre: Skipped 271 previous similar messages Jul 30 05:07:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 05:07:45 fir-md1-s1 kernel: Lustre: Skipped 377 previous similar messages Jul 30 05:14:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 05:14:35 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 30 05:16:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 05:16:33 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 05:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 05:17:47 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 30 05:18:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:24:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 05:24:38 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 30 05:26:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 05:26:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 05:27:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 05:27:47 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 30 05:28:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:36:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 05:36:51 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 05:37:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 05:37:51 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 30 05:37:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 05:37:51 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 05:39:38 fir-md1-s1 kernel: Lustre: 23608:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34f4379b00 x1638542961929600/t0(0) o101->1890d675-ce1f-cd8f-dea3-5b5821d43c68@10.8.0.67@o2ib6:13/0 lens 576/3264 e 1 to 0 dl 1564490383 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 05:39:38 fir-md1-s1 kernel: Lustre: 23608:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 30 05:41:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:42:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:44:07 fir-md1-s1 kernel: Lustre: 23093:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06d4a13850 x1639160363347936/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:12/0 lens 488/440 e 1 to 0 dl 1564490652 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 05:44:18 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f06d4a13850 x1639160363347936/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:12/0 lens 488/408 e 1 to 0 dl 1564490652 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 30 05:44:18 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Jul 30 05:44:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:44:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:47:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 05:47:04 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 05:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 05:47:52 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 05:47:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 05:47:57 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 30 05:50:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 05:57:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 05:57:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 30 05:57:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 38fc721f-2581-5cc7-2331-7b71af28244a (at 10.8.7.30@o2ib6) reconnecting Jul 30 05:57:26 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 05:58:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 05:58:33 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 30 06:02:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 06:02:36 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 06:07:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 06:07:31 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 06:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 06:08:48 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 06:11:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 06:11:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 06:12:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 06:12:38 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 30 06:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 06:17:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 06:19:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 06:19:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 06:19:33 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 06:21:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 06:22:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 06:22:49 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 06:28:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 06:28:05 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 30 06:29:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 06:29:34 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 30 06:31:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 06:31:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 06:33:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 30 06:33:29 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 30 06:38:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 06:38:07 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 06:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 06:39:45 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 30 06:42:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 06:43:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 06:43:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 06:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 06:48:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 06:49:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 06:49:55 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 30 06:54:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 06:54:28 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 06:58:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 06:58:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 07:00:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 07:00:10 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 30 07:05:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:05:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 07:06:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 07:08:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 07:08:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 07:08:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 07:08:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 07:10:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 07:10:14 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 30 07:11:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 07:11:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 07:15:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:15:35 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 07:18:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 07:18:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 07:20:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 07:20:23 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 30 07:25:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:25:44 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 30 07:28:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 07:28:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 07:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 07:29:23 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 30 07:30:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 07:30:27 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 30 07:32:36 fir-md1-s1 kernel: Lustre: 35232:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06d4a10050 x1631643821468112/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564497161 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 07:32:42 fir-md1-s1 kernel: Lustre: 21713:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f06d4a10050 x1631643821468112/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:11/0 lens 488/408 e 1 to 0 dl 1564497161 ref 1 fl Complete:/0/0 rc 131072/131072 Jul 30 07:36:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:36:10 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 07:39:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 07:39:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 07:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 07:40:42 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 30 07:40:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 07:47:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:47:08 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 30 07:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 07:50:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 07:51:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 07:51:20 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 30 07:57:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 07:57:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 30 07:58:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 08:00:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 08:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 08:02:23 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 08:07:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 08:07:29 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 08:10:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 08:10:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 08:11:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:12:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 08:12:25 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 30 08:18:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 30 08:18:45 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 30 08:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 08:20:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 08:21:35 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f1c5140d050 x1639160880324192/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:5/0 lens 488/440 e 0 to 0 dl 1564500095 ref 2 fl Interpret:/0/0 rc 0/0 Jul 30 08:21:35 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 30 08:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -110 Jul 30 08:21:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 30 08:22:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 08:22:32 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 30 08:23:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:27:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:29:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 08:29:44 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 08:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 08:31:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 08:32:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 08:32:40 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 30 08:32:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:33:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:34:15 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f248bf2a450 x1632261186828640/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:19/0 lens 504/448 e 0 to 0 dl 1564500859 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 08:34:15 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 30 08:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Jul 30 08:37:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:38:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:39:42 fir-md1-s1 kernel: Lustre: 23673:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564501175/real 1564501175] req@ffff8f32de13a700 x1636750515954848/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564501182 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 08:39:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 08:39:46 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 30 08:42:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 08:42:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 08:42:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 08:42:41 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 30 08:45:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 08:50:03 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1bc5d2f450 x1639160941415536/t0(0) o3->9177a8c2-b1c2-f6db-3e46-041bce50e59a@10.9.113.4@o2ib4:21/0 lens 488/440 e 1 to 0 dl 1564501821 ref 1 fl Interpret:/0/0 rc 0/0 Jul 30 08:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9177a8c2-b1c2-f6db-3e46-041bce50e59a (at 10.9.113.4@o2ib4), client will retry: rc -107 Jul 30 08:50:03 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 32 previous similar messages Jul 30 08:51:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 08:51:17 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 08:52:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 08:52:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 08:53:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 08:53:04 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 30 08:58:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:00:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:01:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:01:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:01:41 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 09:02:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 09:02:42 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 09:03:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 09:03:08 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 30 09:04:49 fir-md1-s1 kernel: Lustre: 23607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502682/real 1564502682] req@ffff8f29347dd700 x1636750527252608/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502689 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:06:21 fir-md1-s1 kernel: Lustre: 23595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502774/real 1564502774] req@ffff8f10323d1b00 x1636750528149168/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502781 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:07:26 fir-md1-s1 kernel: Lustre: 21676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502839/real 1564502839] req@ffff8f29347dd400 x1636750528716096/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502846 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:08:12 fir-md1-s1 kernel: Lustre: 23570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502885/real 1564502885] req@ffff8f07ca342d00 x1636750528889040/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502892 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:08:19 fir-md1-s1 kernel: Lustre: 24584:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502892/real 1564502892] req@ffff8f1a0ad8b600 x1636750528914752/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502899 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:09:24 fir-md1-s1 kernel: Lustre: 23723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564502957/real 1564502957] req@ffff8f315bd19500 x1636750529021264/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564502964 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:10:08 fir-md1-s1 kernel: Lustre: 21422:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564503001/real 1564503001] req@ffff8f3662c77500 x1636750529464864/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564503008 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 30 09:10:08 fir-md1-s1 kernel: Lustre: 21422:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 30 09:11:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:11:42 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 09:12:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 09:12:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 09:13:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 09:13:12 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 09:14:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:20:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:21:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:22:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:22:13 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 30 09:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 09:22:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 09:23:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:23:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 09:23:14 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 30 09:32:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:33:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 09:33:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 09:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 09:33:19 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 30 09:33:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:33:44 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 09:36:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:41:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 09:43:22 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 09:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 09:43:22 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 09:44:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:44:55 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 09:53:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 09:53:39 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 09:53:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 09:53:39 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 30 09:54:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 09:55:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 09:55:58 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 30 10:01:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 10:03:43 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 10:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 10:03:43 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 30 10:05:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:06:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 10:06:36 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 30 10:08:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:13:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:13:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 10:14:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 10:14:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 10:14:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 10:14:01 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 30 10:17:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 30 10:17:36 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 10:24:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:24:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 10:24:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 10:24:02 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 10:24:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 10:24:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 10:27:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 10:27:49 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 30 10:34:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 10:34:04 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 10:34:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 10:34:27 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 10:34:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:34:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 10:38:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 10:38:50 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 30 10:44:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 10:44:09 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 30 10:45:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:45:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 10:45:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 10:45:13 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 10:50:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 10:50:41 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 30 10:54:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 10:54:13 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 10:55:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 10:55:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 10:55:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 10:55:19 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 11:00:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:00:50 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 30 11:04:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 11:04:14 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 30 11:05:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 11:05:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 11:11:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:11:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 11:12:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 11:12:13 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 11:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 11:14:26 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 11:15:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 11:15:48 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 30 11:22:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:22:06 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 11:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 11:24:46 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 11:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 11:26:06 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 11:26:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 11:26:58 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 30 11:33:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:33:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 11:34:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 11:34:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 11:36:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 11:36:06 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 11:40:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 11:40:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 11:43:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:43:53 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 11:44:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 11:44:48 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Jul 30 11:46:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 11:46:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 11:51:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 11:51:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 11:55:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 11:55:14 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 11:56:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 11:56:16 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 11:56:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 11:56:57 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 30 12:01:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 12:01:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 12:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 12:05:29 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 30 12:06:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 12:06:28 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 12:07:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 12:07:06 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 30 12:12:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 12:12:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 12:15:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 12:15:34 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 12:16:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 12:16:31 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 12:17:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 12:17:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 12:28:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16c7dbbc00, cur 1564514892 expire 1564514742 last 1564514665 Jul 30 12:28:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 12:28:23 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 30 12:28:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 12:28:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 12:29:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 12:29:14 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 12:30:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 12:30:07 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 12:38:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 12:38:37 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 12:38:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 12:38:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 12:39:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 12:39:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 12:43:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 30 12:43:40 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 12:48:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 12:48:39 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 30 12:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 12:49:11 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 12:50:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 12:50:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 12:54:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 12:54:25 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 12:58:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 12:58:58 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 12:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 12:59:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 13:00:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 13:00:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 13:05:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 13:05:00 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 13:09:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 13:09:03 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 13:09:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 13:09:22 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 13:15:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 13:15:24 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 13:16:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 13:19:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 13:19:06 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 13:19:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 13:19:33 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 13:26:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 13:26:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 13:27:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 13:27:20 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 30 13:29:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 13:29:18 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 30 13:31:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 13:31:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 13:36:57 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:01 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:07 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:12 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:12 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 30 13:37:21 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 13:37:26 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 13:37:31 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 13:37:31 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Jul 30 13:37:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 13:37:54 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 30 13:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 13:39:39 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 13:41:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 13:41:55 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 13:50:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 13:50:01 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 13:50:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 30 13:50:06 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 13:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 13:51:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 13:53:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 13:53:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 14:00:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 14:00:21 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 30 14:00:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 14:00:23 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 30 14:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 14:02:18 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 14:05:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:05:19 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 14:10:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 14:10:27 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 30 14:10:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 14:10:27 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Jul 30 14:13:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 14:13:00 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 14:15:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:15:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 14:20:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 14:20:33 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 14:20:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 14:20:33 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 30 14:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 14:23:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 14:27:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:27:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 14:30:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 14:30:57 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 30 14:32:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 14:32:49 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 30 14:33:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 14:33:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 14:37:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:37:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 14:40:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 14:40:59 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 30 14:43:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 14:43:25 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 30 14:43:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 14:43:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 14:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 665f37fd-3c14-5f07-51f7-2b6b1af3018b (at 10.9.115.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f353b7f3c00, cur 1564523071 expire 1564522921 last 1564522844 Jul 30 14:49:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:49:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 14:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 14:51:04 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 30 14:53:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 30 14:53:35 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 30 14:53:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 14:53:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 14:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 14:59:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 15:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 15:01:08 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Jul 30 15:03:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 15:03:36 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 15:04:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 15:04:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 15:10:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 15:10:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 15:11:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 15:11:09 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 15:11:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15da1ba400, cur 1564524702 expire 1564524552 last 1564524475 Jul 30 15:11:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 30 15:13:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 15:13:37 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 15:15:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 15:15:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 15:21:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 15:21:21 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 15:23:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 15:23:50 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 30 15:24:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 15:24:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 15:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 15:25:18 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 15:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20e9d09400, cur 1564525753 expire 1564525603 last 1564525526 Jul 30 15:32:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 15:32:08 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 30 15:36:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 15:36:08 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 15:36:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 15:36:34 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 15:38:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 15:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 15:42:11 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 30 15:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 15:46:26 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 30 15:47:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 15:47:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 15:50:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 15:50:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 15:52:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 15:52:15 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 15:56:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 15:56:36 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 15:59:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 30 15:59:49 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 30 16:00:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 16:00:31 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 16:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 16:02:18 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 16:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 16:07:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 16:11:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 16:11:59 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 16:12:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 16:12:18 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 30 16:17:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 16:17:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 16:20:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 16:20:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 16:22:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 16:22:15 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 30 16:22:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 16:22:25 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 16:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 16:29:41 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 16:32:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 16:32:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 16:32:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 16:32:33 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 16:33:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 16:33:47 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 30 16:40:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 16:40:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 16:42:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 16:42:46 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 30 16:43:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 16:43:39 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 16:43:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 16:43:48 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 16:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 16:50:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 30 16:52:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 16:52:50 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 30 16:56:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 16:56:49 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 16:57:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 16:57:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 17:00:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 17:00:44 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 17:02:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 17:02:52 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 17:06:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 30 17:06:56 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 17:11:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 17:11:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 17:11:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 17:11:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 17:12:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 17:12:52 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 30 17:19:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 17:19:50 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 30 17:21:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 17:21:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 17:22:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 17:22:55 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 30 17:24:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 17:24:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 30 17:31:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 17:31:33 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 30 17:32:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 17:32:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 17:32:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 17:32:58 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 30 17:34:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 17:34:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 17:41:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 17:41:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 17:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 17:43:05 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 30 17:45:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.29.8@o2ib6, removing former export from same NID Jul 30 17:45:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 17:49:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 17:49:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 17:51:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 17:51:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 17:53:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 17:53:05 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 17:53:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24ccc0bc00, cur 1564534432 expire 1564534282 last 1564534205 Jul 30 17:56:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 17:56:11 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 30 18:01:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 18:01:39 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 18:03:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 18:03:14 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 30 18:05:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 18:06:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 18:06:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 18:07:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2faa2d6800, cur 1564535274 expire 1564535124 last 1564535047 Jul 30 18:11:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 18:11:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 18:13:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 18:13:23 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 30 18:16:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 18:16:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 18:18:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 18:18:23 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 30 18:22:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 18:22:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 18:23:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 18:23:47 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 30 18:26:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 18:26:20 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 18:29:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 18:29:51 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 30 18:32:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 18:32:37 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 18:35:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f324c17a400, cur 1564536910 expire 1564536760 last 1564536683 Jul 30 18:35:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 18:35:15 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 18:37:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 18:37:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 30 18:40:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 18:40:02 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 18:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 18:43:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 30 18:45:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 18:45:31 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 18:50:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 18:50:55 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 30 18:51:50 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 30 18:51:50 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Jul 30 18:52:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 18:52:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 18:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 18:54:30 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 30 18:55:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 18:55:34 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 30 19:02:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:02:10 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 19:04:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 19:04:39 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 30 19:05:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 19:05:39 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 30 19:08:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 19:08:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 19:14:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:14:16 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 30 19:14:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 19:14:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 30 19:15:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 19:15:42 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 30 19:20:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 19:20:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 19:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 19:24:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 30 19:25:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:25:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 30 19:25:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 19:25:49 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 30 19:34:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 19:34:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 19:35:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:35:27 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 30 19:35:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 19:35:52 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Jul 30 19:38:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 19:38:14 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 30 19:38:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b7460a400, cur 1564540724 expire 1564540574 last 1564540497 Jul 30 19:44:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 19:44:58 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 19:46:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:46:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 30 19:46:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 19:46:06 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 30 19:52:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e80735400, cur 1564541531 expire 1564541381 last 1564541304 Jul 30 19:53:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 19:53:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 19:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 19:55:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 19:56:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 19:56:11 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 30 19:56:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 19:56:16 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 20:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 20:06:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 20:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 20:06:17 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 20:06:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 20:06:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 30 20:11:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 20:11:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 20:16:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 20:16:23 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 30 20:16:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 20:16:23 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 20:17:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 20:17:11 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 20:22:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 20:22:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 20:26:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 20:26:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 20:27:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 20:27:04 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 20:31:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 20:31:20 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 30 20:33:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 20:33:42 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 20:36:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 20:36:51 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 30 20:37:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 20:37:35 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 30 20:41:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 20:41:27 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 30 20:47:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 20:47:12 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 30 20:48:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 20:48:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 20:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 20:48:46 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 20:52:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 20:52:05 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 30 20:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 20:57:18 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Jul 30 20:58:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 20:58:54 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 21:01:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 21:01:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 21:02:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 21:02:21 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 30 21:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 21:07:19 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 30 21:08:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 21:08:55 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 21:12:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 21:12:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 21:13:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 21:13:22 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 30 21:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 21:17:27 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 30 21:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 21:19:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 21:23:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 30 21:23:23 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 21:27:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 21:27:28 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 30 21:28:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 21:28:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 21:29:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 30 21:29:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 30 21:35:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 21:35:41 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 30 21:37:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 21:37:43 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 30 21:38:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 21:38:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 21:39:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 21:39:03 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 30 21:45:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 21:45:50 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 30 21:47:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 21:47:51 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 30 21:49:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 21:49:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 30 21:55:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 21:55:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 21:57:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 21:57:11 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 21:57:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 30 21:57:52 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 30 21:59:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 21:59:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 30 22:04:54 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 22:05:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 22:05:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 22:07:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:07:15 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 30 22:07:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 22:07:54 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 30 22:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 22:09:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 22:17:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:17:16 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 30 22:18:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 22:18:19 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 30 22:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 22:19:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 30 22:23:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 22:23:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 30 22:27:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:27:29 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 30 22:28:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 22:28:39 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 22:29:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 30 22:29:58 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 30 22:31:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 22:37:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 22:37:11 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 30 22:38:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:38:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 30 22:38:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 22:38:55 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 30 22:40:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 22:40:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 22:48:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:48:08 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 30 22:49:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 22:49:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 22:49:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 22:49:03 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 22:50:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 22:50:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 22:58:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 22:58:09 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 30 22:59:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 22:59:04 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 30 22:59:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 22:59:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 23:00:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 23:00:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 30 23:09:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 23:09:12 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 30 23:09:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 23:09:14 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 30 23:10:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 23:10:29 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 23:15:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 23:15:44 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 30 23:19:16 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Jul 30 23:19:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 23:19:18 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 23:19:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 23:19:18 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 30 23:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 23:21:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 30 23:29:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 23:29:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 23:29:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 30 23:29:26 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 30 23:29:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 30 23:29:26 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 30 23:31:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 30 23:31:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 30 23:39:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 30 23:39:46 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 30 23:40:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 23:40:40 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 30 23:41:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 30 23:41:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 30 23:44:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 30 23:44:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 23:44:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 30 23:49:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 30 23:49:47 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 30 23:50:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 30 23:50:43 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 30 23:51:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 30 23:51:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 30 23:57:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 30 23:57:44 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 30 23:59:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 30 23:59:56 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 31 00:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 00:01:45 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 00:02:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 00:02:11 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 00:08:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 00:08:27 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 00:10:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 00:10:19 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 00:11:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 00:11:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 00:12:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 00:12:14 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 31 00:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 00:20:26 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 31 00:20:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 00:20:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 00:22:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 00:22:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 00:22:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 00:22:25 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 31 00:30:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 00:30:27 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 31 00:32:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 00:32:43 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 31 00:35:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 00:35:39 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 31 00:37:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 00:37:52 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 00:40:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 00:40:51 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 00:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 00:43:00 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 00:45:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 00:45:41 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 00:49:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 00:49:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 00:50:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 00:50:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 31 00:53:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 00:53:05 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 00:56:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 00:56:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 01:01:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 01:01:17 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 01:02:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:02:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 01:03:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 01:03:21 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 31 01:06:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 01:06:58 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 01:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 01:11:21 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 01:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 01:13:23 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 31 01:14:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:14:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 01:17:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 01:17:06 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 31 01:22:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 01:22:03 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 01:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 01:23:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 01:24:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2648532400, cur 1564561448 expire 1564561298 last 1564561221 Jul 31 01:24:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:24:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 01:28:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 01:28:04 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 31 01:32:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 01:32:13 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 01:33:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 01:33:31 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 01:34:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:34:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 01:39:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 01:39:18 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 31 01:42:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 01:42:14 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 01:43:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 01:43:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 01:47:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:47:19 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 01:49:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 01:49:20 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 01:52:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 01:52:14 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 31 01:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 01:54:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 01:57:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 01:57:21 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 01:59:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 01:59:22 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 31 02:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 02:02:23 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 31 02:04:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 02:04:23 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 31 02:08:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 02:08:57 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 31 02:12:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 02:12:31 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Jul 31 02:12:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 02:12:31 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 02:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 02:15:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 02:22:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 02:22:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 02:22:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 02:22:32 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 02:22:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 02:22:32 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 31 02:26:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 02:26:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 31 02:32:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 02:32:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 02:32:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 02:32:34 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 02:32:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 02:32:50 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 31 02:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 02:37:47 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 02:40:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f172a8f5000, cur 1564566018 expire 1564565868 last 1564565791 Jul 31 02:42:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 02:42:35 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 31 02:42:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 02:42:55 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 02:43:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 02:43:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 02:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 02:48:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 31 02:52:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 02:52:39 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 31 02:53:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 02:53:00 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 31 02:53:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 02:53:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 02:58:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 02:58:54 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 03:02:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 03:02:47 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 03:03:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 03:03:06 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 03:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 03:09:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 03:12:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 03:12:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 03:13:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 03:13:15 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 31 03:13:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 03:13:17 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 31 03:19:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25351d4400, cur 1564568374 expire 1564568224 last 1564568147 Jul 31 03:20:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 03:20:25 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 31 03:24:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 03:24:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 03:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 03:25:05 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 31 03:26:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 03:26:26 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 31 03:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 03:30:37 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 31 03:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 03:35:26 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 03:38:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 03:38:09 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 03:40:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 03:40:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 03:42:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 03:42:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 03:45:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 03:45:45 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 03:48:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 03:48:10 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 31 03:50:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15e8e35c00, cur 1564570200 expire 1564570050 last 1564569973 Jul 31 03:51:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 03:51:13 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 03:54:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 03:54:37 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 03:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 03:56:04 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 31 04:00:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 04:00:51 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 04:01:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 04:01:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 04:06:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 04:06:09 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 31 04:12:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 04:12:07 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 04:12:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 04:12:12 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 31 04:12:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 04:12:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 04:16:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 04:16:31 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 31 04:22:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 04:22:13 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 31 04:22:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 04:22:51 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 31 04:25:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 04:25:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 04:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de8c3f400, cur 1564572322 expire 1564572172 last 1564572095 Jul 31 04:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 04:26:36 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 31 04:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 04:33:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 31 04:34:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 04:34:55 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 31 04:34:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f203da1f800, cur 1564572899 expire 1564572749 last 1564572672 Jul 31 04:36:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 04:36:47 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 31 04:39:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 04:39:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 04:43:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 04:43:30 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 04:45:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 04:45:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 31 04:47:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 04:47:01 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 31 04:50:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 04:50:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 04:53:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 04:53:39 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 04:57:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 04:57:22 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 31 05:00:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 05:00:51 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 05:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 05:05:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 05:05:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 05:05:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 05:07:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:07:31 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 31 05:11:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 05:11:00 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 31 05:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 05:15:23 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 05:16:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ca54da400, cur 1564575372 expire 1564575222 last 1564575145 Jul 31 05:17:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:17:34 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Jul 31 05:18:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 05:18:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 05:21:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 05:21:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 31 05:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d9ecc6400, cur 1564575839 expire 1564575689 last 1564575612 Jul 31 05:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 05:25:29 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 31 05:27:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:27:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 05:28:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 31 05:28:51 fir-md1-s1 kernel: Lustre: 31007:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f38ade68050 x1638083684648032/t0(0) o103->0eaeb89b-859f-1fc8-d1f0-672563c1d160@10.8.8.24@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Jul 31 05:28:51 fir-md1-s1 kernel: Lustre: 31007:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message Jul 31 05:28:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Jul 31 05:28:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (5): c: 8, oc: 0, rc: 8 Jul 31 05:28:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Jul 31 05:28:51 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 0+0s req@ffff8f25a7b5d050 x1631613941458960/t0(0) o3->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1564576131 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 05:28:51 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 05:28:51 fir-md1-s1 kernel: LustreError: 20196:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e83826400 Jul 31 05:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with f7eae5f9-18e9-99eb-0207-24a1fdf92451 (at 10.9.113.2@o2ib4), client will retry: rc = -110 Jul 31 05:28:51 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 31 05:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 40db60e6-2b5f-e52d-2610-43b84e2f829d (at 10.8.29.1@o2ib6), client will retry: rc -110 Jul 31 05:28:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Jul 31 05:31:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 05:31:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 05:31:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 05:31:33 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 31 05:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 05:35:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 05:37:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:37:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 31 05:41:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 05:41:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 05:45:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 05:45:17 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 05:46:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 05:46:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 05:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:47:51 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 31 05:51:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 05:51:56 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 31 05:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 05:56:20 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 05:58:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 05:58:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 05:58:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 05:58:00 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 06:02:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 31 06:02:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 06:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 06:06:42 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 31 06:08:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 06:08:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 31 06:12:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 06:12:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 06:12:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 06:12:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 06:17:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 06:17:56 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 31 06:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 06:19:26 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 31 06:22:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 06:22:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 06:22:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 06:22:59 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 06:28:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 06:28:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Jul 31 06:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 06:29:35 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 31 06:33:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 06:33:05 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 06:33:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 06:33:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 31 06:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 06:39:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 31 06:39:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 06:39:48 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 06:44:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 06:44:09 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 06:45:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 06:45:57 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 06:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 06:50:02 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 31 06:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 06:50:02 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 06:55:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 06:55:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 06:56:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 06:56:02 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 07:00:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 07:00:07 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 07:00:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 07:00:07 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 07:06:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:06:04 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 07:06:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 07:06:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 07:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 07:10:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 07:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 07:10:16 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Jul 31 07:16:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 07:16:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 07:17:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:17:05 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 31 07:20:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 07:20:20 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 31 07:20:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 07:20:27 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 07:27:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:27:49 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 31 07:28:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 07:28:12 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 07:30:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 07:30:45 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 07:31:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 07:31:17 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 07:37:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:37:53 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 07:40:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 07:40:12 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 07:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 07:40:58 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 31 07:42:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 07:42:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 07:48:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:48:44 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Jul 31 07:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 07:51:37 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 31 07:52:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 07:52:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Jul 31 07:52:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 07:52:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 07:58:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 07:58:50 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 08:01:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 08:01:53 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 08:04:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 08:04:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Jul 31 08:07:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 08:07:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 08:08:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 08:08:50 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Jul 31 08:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 08:13:03 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 31 08:14:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 08:14:25 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 08:18:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 08:18:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 08:20:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 08:20:58 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 08:23:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 08:23:13 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 31 08:24:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 08:24:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 08:29:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 08:29:57 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 08:33:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 08:33:13 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 08:33:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 08:33:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 08:34:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 08:34:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 08:36:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ce5c450b-6385-996a-5b75-5f7f2f80164d (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f329a683400, cur 1564587405 expire 1564587255 last 1564587178 Jul 31 08:36:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client fbcfd0cb-588e-c2b6-fda6-1eadd71eb54e (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24a8bfc400, cur 1564587417 expire 1564587267 last 1564587190 Jul 31 08:36:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 08:37:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ce5c450b-6385-996a-5b75-5f7f2f80164d (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fdc5c5400, cur 1564587434 expire 1564587284 last 1564587207 Jul 31 08:38:39 fir-md1-s1 kernel: LustreError: 97646:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2297bbd700 x1636751968209632/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 08:38:39 fir-md1-s1 kernel: LustreError: 97646:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 9 previous similar messages Jul 31 08:38:54 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f16fc848c00 x1639518715984016/t0(0) o36->04c17dce-45f1-fe7e-2627-7efeaaeaddb9@10.9.0.62@o2ib4:29/0 lens 496/448 e 1 to 0 dl 1564587539 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 08:39:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ecf628900/0x5d9ee6a1e69428d3 lrc: 3/0,0 mode: PR/PR res: [0x200029e16:0x68c2:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0xe8ba7c3721b20efc expref: 409443 pid: 23761 timeout: 3702608 lvb_type: 0 Jul 31 08:39:17 fir-md1-s1 kernel: Lustre: 23715:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3f33765400 x1639518716234288/t0(0) o101->04c17dce-45f1-fe7e-2627-7efeaaeaddb9@10.9.0.62@o2ib4:22/0 lens 576/3264 e 1 to 0 dl 1564587562 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 08:39:28 fir-md1-s1 kernel: LustreError: 21412:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f304e350900 x1636751968694592/t0(0) o104->fir-MDT0000@10.8.15.6@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 08:39:43 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f8c14bc00 x1636354976878352/t0(0) o101->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:18/0 lens 1784/3288 e 1 to 0 dl 1564587588 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 08:39:55 fir-md1-s1 kernel: Lustre: 23649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0df09d0f00 x1638896358684160/t0(0) o101->d1800347-72ce-eadd-608d-51a435000390@10.9.112.15@o2ib4:29/0 lens 576/3264 e 0 to 0 dl 1564587599 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 08:39:57 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f317a98f980/0x5d9ee6a1e6946d44 lrc: 3/0,0 mode: PR/PR res: [0x2000298ba:0x433f:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.15.6@o2ib6 remote: 0xe8ba7c3721b20f34 expref: 324730 pid: 97652 timeout: 3702657 lvb_type: 0 Jul 31 08:40:09 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564587519, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1980ddc380/0x5d9ee6a1ea0336f5 lrc: 3/0,1 mode: --/CW res: [0x200029e16:0x68c2:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97646 timeout: 0 lvb_type: 0 Jul 31 08:40:09 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 10 previous similar messages Jul 31 08:40:19 fir-md1-s1 kernel: Lustre: 23760:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d99b58c00 x1638804678260976/t0(0) o101->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:23/0 lens 576/3264 e 0 to 0 dl 1564587623 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 08:40:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 08:40:20 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 08:40:32 fir-md1-s1 kernel: LustreError: 23719:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564587542, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f398d311440/0x5d9ee6a1ea2d5008 lrc: 3/1,0 mode: --/PR res: [0x200029e16:0x68c2:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23719 timeout: 0 lvb_type: 0 Jul 31 08:40:58 fir-md1-s1 kernel: LustreError: 21412:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564587568, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3451797500/0x5d9ee6a1ea5e41de lrc: 3/0,1 mode: --/CW res: [0x2000298ba:0x433f:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21412 timeout: 0 lvb_type: 0 Jul 31 08:41:23 fir-md1-s1 kernel: LustreError: 23734:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564587593, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3077244ec0/0x5d9ee6a1ea8c217d lrc: 3/1,0 mode: --/PR res: [0x2000298ba:0x433f:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23734 timeout: 0 lvb_type: 0 Jul 31 08:41:23 fir-md1-s1 kernel: LustreError: 23734:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Jul 31 08:41:59 fir-md1-s1 kernel: LNet: Service thread pid 97646 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 31 08:41:59 fir-md1-s1 kernel: Pid: 97646, comm: mdt01_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 31 08:41:59 fir-md1-s1 kernel: Call Trace: Jul 31 08:41:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 31 08:41:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_create+0x569/0x1090 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Jul 31 08:41:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 31 08:41:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 31 08:41:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 31 08:41:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 31 08:41:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 31 08:41:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 31 08:41:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564587719.97646 Jul 31 08:42:22 fir-md1-s1 kernel: LNet: Service thread pid 23719 was inactive for 200.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 31 08:42:22 fir-md1-s1 kernel: Pid: 23719, comm: mdt03_099 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 31 08:42:22 fir-md1-s1 kernel: Call Trace: Jul 31 08:42:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 31 08:42:22 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 31 08:42:22 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 31 08:42:22 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 31 08:42:22 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 31 08:42:22 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 31 08:42:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 31 08:42:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 31 08:42:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 31 08:42:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564587742.23719 Jul 31 08:42:32 fir-md1-s1 kernel: LustreError: 23695:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564587662, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f28de544ec0/0x5d9ee6a1eb05562a lrc: 3/1,0 mode: --/PR res: [0x200029e16:0x68c2:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23695 timeout: 0 lvb_type: 0 Jul 31 08:42:32 fir-md1-s1 kernel: LustreError: 23695:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Jul 31 08:42:42 fir-md1-s1 kernel: LNet: Service thread pid 23719 completed after 220.48s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 31 08:42:42 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 31 08:42:48 fir-md1-s1 kernel: LNet: Service thread pid 21412 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 31 08:42:48 fir-md1-s1 kernel: Pid: 21412, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 31 08:42:48 fir-md1-s1 kernel: Call Trace: Jul 31 08:42:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 31 08:42:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 31 08:42:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 31 08:42:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 31 08:42:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 31 08:42:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 31 08:42:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 31 08:42:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 31 08:42:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 31 08:42:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 31 08:42:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 31 08:42:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564587769.21412 Jul 31 08:42:50 fir-md1-s1 kernel: Pid: 23612, comm: mdt00_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Jul 31 08:42:50 fir-md1-s1 kernel: Call Trace: Jul 31 08:42:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jul 31 08:42:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Jul 31 08:42:50 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Jul 31 08:42:50 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jul 31 08:42:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Jul 31 08:42:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Jul 31 08:42:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Jul 31 08:42:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jul 31 08:42:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Jul 31 08:42:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1564587770.23612 Jul 31 08:43:03 fir-md1-s1 kernel: LNet: Service thread pid 21412 completed after 215.17s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jul 31 08:43:03 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Jul 31 08:43:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 08:43:16 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Jul 31 08:44:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 08:44:46 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 08:44:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 08:44:57 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 31 08:51:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 08:51:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 08:54:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 08:54:12 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 31 08:54:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 08:54:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 08:55:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 08:55:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 31 09:04:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 09:04:13 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Jul 31 09:04:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 09:04:57 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 31 09:05:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 09:05:37 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 09:08:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 09:08:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 09:14:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 09:14:17 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Jul 31 09:14:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5f2a1c36-7d7e-0180-c814-e39cd37d2493 (at 10.8.20.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24effe4400, cur 1564589667 expire 1564589517 last 1564589440 Jul 31 09:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 09:15:38 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 09:21:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 09:21:01 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Jul 31 09:24:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 09:24:20 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 09:25:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 09:25:05 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 09:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 09:25:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 31 09:26:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17f47ab400, cur 1564590369 expire 1564590219 last 1564590142 Jul 31 09:26:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 09:28:36 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564590506/real 1564590506] req@ffff8f1d2f125a00 x1636751987935632/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564590516 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 09:28:46 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564590516/real 1564590516] req@ffff8f1d2f125a00 x1636751987935632/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564590526 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 09:28:51 fir-md1-s1 kernel: Lustre: 23698:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0e92afd700 x1638804680056576/t0(0) o101->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:26/0 lens 480/568 e 0 to 0 dl 1564590536 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:29:00 fir-md1-s1 kernel: Lustre: 21181:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564590530/real 1564590530] req@ffff8f27a31f1b00 x1636751988158736/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564590540 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 09:29:05 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f280f3b2700 x1635695926531072/t0(0) o101->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:10/0 lens 480/568 e 1 to 0 dl 1564590550 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:29:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 49s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f27f3e172c0/0x5d9ee6a2056aaa9d lrc: 3/0,0 mode: PR/PR res: [0x200029d54:0x1da47:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81774d596fa9 expref: 305423 pid: 21412 timeout: 3705615 lvb_type: 0 Jul 31 09:29:15 fir-md1-s1 kernel: LustreError: 26626:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1564590555 with bad export cookie 6746082742741860067 Jul 31 09:29:15 fir-md1-s1 kernel: LustreError: 26626:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Jul 31 09:29:16 fir-md1-s1 kernel: LustreError: 48114:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1564590556 with bad export cookie 6746082742741860067 Jul 31 09:29:16 fir-md1-s1 kernel: LustreError: 48114:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 81 previous similar messages Jul 31 09:29:18 fir-md1-s1 kernel: LustreError: 48114:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1564590558 with bad export cookie 6746082742741860067 Jul 31 09:29:18 fir-md1-s1 kernel: LustreError: 48114:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 184 previous similar messages Jul 31 09:29:22 fir-md1-s1 kernel: LustreError: 26888:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1564590562 with bad export cookie 6746082742741860067 Jul 31 09:29:22 fir-md1-s1 kernel: LustreError: 26888:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 288 previous similar messages Jul 31 09:29:30 fir-md1-s1 kernel: LustreError: 31012:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1564590570 with bad export cookie 6746082742741860067 Jul 31 09:29:30 fir-md1-s1 kernel: LustreError: 31012:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 644 previous similar messages Jul 31 09:29:33 fir-md1-s1 kernel: LustreError: 10585:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2848acb900 x1636751988508256/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 09:29:43 fir-md1-s1 kernel: LustreError: 23759:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2a71dcef00 x1636751988580144/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 09:29:50 fir-md1-s1 kernel: LustreError: 23746:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2f095afb00 x1636751988629056/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 09:29:50 fir-md1-s1 kernel: LustreError: 23746:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 31 09:29:58 fir-md1-s1 kernel: Lustre: 21380:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f280f3b2700 x1633731072220224/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:3/0 lens 480/568 e 1 to 0 dl 1564590603 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:30:05 fir-md1-s1 kernel: Lustre: 21181:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3511d9b600 x1639958825877808/t0(0) o101->f111b25a-6d2a-16a8-5df8-392d9e810365@10.8.15.4@o2ib6:10/0 lens 480/568 e 1 to 0 dl 1564590610 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:30:05 fir-md1-s1 kernel: Lustre: 21181:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 31 09:30:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f3250192d00/0x5d9ee6a2056b7a43 lrc: 3/0,0 mode: PR/PR res: [0x200029ecb:0x2ee:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81774d5995dc expref: 121076 pid: 23656 timeout: 3705672 lvb_type: 0 Jul 31 09:30:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f299d238240/0x5d9ee6a20562d690 lrc: 3/0,0 mode: PR/PR res: [0x200029ec5:0x299:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81774d57fd4a expref: 110052 pid: 21412 timeout: 3705679 lvb_type: 0 Jul 31 09:30:19 fir-md1-s1 kernel: LustreError: 23691:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564590529, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0e3dde4380/0x5d9ee6a2060716b0 lrc: 3/0,1 mode: --/PW res: [0x200029d54:0x1da47:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23691 timeout: 0 lvb_type: 0 Jul 31 09:30:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 31 09:30:20 fir-md1-s1 kernel: LustreError: 24584:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f173dbde000 x1636751988866976/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 09:30:36 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bd0462d00 x1636354987792768/t0(0) o101->f7eae5f9-18e9-99eb-0207-24a1fdf92451@10.9.113.2@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1564590640 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:30:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f15d9142640/0x5d9ee6a20458f0bb lrc: 3/0,0 mode: PR/PR res: [0x2000297f4:0x114dd:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81774cf4b7ae expref: 67162 pid: 97662 timeout: 3705710 lvb_type: 0 Jul 31 09:30:56 fir-md1-s1 kernel: LustreError: 23673:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2e1e0fec00 x1636751989209520/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 09:31:13 fir-md1-s1 kernel: LustreError: 23759:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564590583, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f270e1ee0c0/0x5d9ee6a2067058b2 lrc: 3/0,1 mode: --/PW res: [0x200029937:0x1aa5:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23759 timeout: 0 lvb_type: 0 Jul 31 09:31:21 fir-md1-s1 kernel: Lustre: 21671:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26e0889b00 x1636469303974976/t437271321317(0) o36->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:26/0 lens 488/3152 e 0 to 0 dl 1564590686 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 09:31:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f297170a400/0x5d9ee6a205593a87 lrc: 3/0,0 mode: PR/PR res: [0x20000fb8f:0x690:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81774d5607d5 expref: 26400 pid: 23733 timeout: 3705745 lvb_type: 0 Jul 31 09:32:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 09:32:07 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 09:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 09:34:24 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Jul 31 09:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 09:35:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 09:37:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 09:37:35 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 09:42:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 09:42:48 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 09:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 09:44:53 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 09:46:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 09:46:20 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 09:49:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 09:49:07 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 09:55:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 09:55:25 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 31 09:55:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 09:55:25 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 09:56:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 09:56:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 10:02:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 10:02:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 10:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 10:05:29 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Jul 31 10:06:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 10:06:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 31 10:07:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 10:07:08 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 10:16:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 10:16:09 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 31 10:16:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 10:16:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 10:17:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 10:17:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 10:20:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 10:20:29 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 10:26:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 10:26:10 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Jul 31 10:27:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 10:27:39 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 10:28:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2e786a3c-ce2d-4aaf-f308-e7273c15a682 (at 10.8.9.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f3eeac00, cur 1564594082 expire 1564593932 last 1564593855 Jul 31 10:28:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 10:28:13 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 10:31:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 10:31:23 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Jul 31 10:36:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 10:36:16 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Jul 31 10:38:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 10:38:16 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 10:38:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 10:38:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 10:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 10:41:26 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 31 10:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 10:46:56 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 10:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 10:48:16 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 31 10:49:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 10:49:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 10:51:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 10:51:28 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 10:54:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f5726655-a02d-c171-982b-0e82e30dea86 (at 10.9.106.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c09fd7000, cur 1564595680 expire 1564595530 last 1564595453 Jul 31 10:54:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 10:56:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 10:56:57 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Jul 31 10:58:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 10:58:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 11:00:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 11:00:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 11:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c7788796-a4a4-39ed-fee9-e89a99a8ee3d (at 10.9.108.65@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4445243000, cur 1564596261 expire 1564596111 last 1564596034 Jul 31 11:04:21 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 31 11:05:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 11:05:13 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 11:07:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 11:07:09 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 11:09:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 11:09:35 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 11:10:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 11:10:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 11:16:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 11:16:14 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 11:17:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 11:17:36 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Jul 31 11:19:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 11:19:48 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 11:26:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 11:26:02 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 11:28:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 11:28:14 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Jul 31 11:29:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 11:29:07 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 31 11:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 11:31:23 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 11:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 300a3ee0-8aad-40e7-4a81-2675e4ac6d90 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f88f6800, cur 1564598274 expire 1564598124 last 1564598047 Jul 31 11:37:54 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 11:38:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 11:38:40 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 11:39:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 11:39:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 11:41:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 11:41:13 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 11:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 11:41:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 11:48:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 11:48:43 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 11:51:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 11:51:14 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 11:51:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 11:51:42 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 11:51:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 11:51:55 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 31 11:58:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 11:58:47 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 12:01:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 12:01:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 12:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 12:02:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 31 12:02:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 12:02:42 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 12:09:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 12:09:04 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 31 12:12:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 12:12:03 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 12:12:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 12:12:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 12:13:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 12:13:54 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 12:19:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 12:19:21 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Jul 31 12:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 12:23:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 12:23:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 12:23:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 31 12:29:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 12:29:22 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Jul 31 12:29:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 12:29:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 12:33:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 12:33:07 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 31 12:34:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 12:34:56 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 31 12:39:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 12:39:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 31 12:42:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 12:42:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 12:43:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 12:43:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 12:45:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 12:45:26 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 12:49:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 12:49:26 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 31 12:52:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 12:52:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 12:53:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 12:53:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 31 12:55:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 12:55:36 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 12:59:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 12:59:32 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Jul 31 13:03:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 13:03:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 31 13:03:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 13:03:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 13:06:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 13:06:00 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Jul 31 13:09:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 13:09:42 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 31 13:13:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 13:13:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Jul 31 13:16:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 31 13:16:02 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 13:18:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 13:18:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 13:20:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 13:20:16 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Jul 31 13:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 13:23:19 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 13:26:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 13:26:58 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 13:28:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 13:28:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 13:30:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 13:30:26 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Jul 31 13:35:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 13:35:02 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 31 13:37:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 13:37:15 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 31 13:40:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 13:40:27 fir-md1-s1 kernel: Lustre: Skipped 125 previous similar messages Jul 31 13:41:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 13:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 13:45:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 13:48:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 13:48:24 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 31 13:50:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 13:50:28 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 13:52:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 13:52:48 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 13:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 13:56:04 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 13:56:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21bbad3c00, cur 1564606570 expire 1564606420 last 1564606343 Jul 31 13:56:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 13:58:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 13:58:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 14:00:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 14:00:33 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Jul 31 14:06:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 14:06:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 14:08:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 14:08:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 14:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 14:08:52 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 31 14:10:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 14:10:35 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 31 14:16:20 fir-md1-s1 kernel: Lustre: 55551:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06e0199450 x1631341008725392/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:25/0 lens 304/240 e 1 to 0 dl 1564607785 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:16:25 fir-md1-s1 kernel: LustreError: 55488:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f06e0199450 x1631341008725392/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:25/0 lens 304/240 e 1 to 0 dl 1564607785 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:16:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 06d24d01-86fd-11a8-6dcf-d16043d84c98 (at 10.8.25.8@o2ib6) reconnecting Jul 31 14:16:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 14:16:58 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1b4df7b600 x1631774821924144/t0(0) o101->ee2fc29b-a70b-6d11-2477-6e5c3f3348b3@10.8.20.18@o2ib6:3/0 lens 376/1600 e 1 to 0 dl 1564607823 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:17:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.20.18@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f06a3d272c0/0x5d9ee6a2a71bd453 lrc: 3/0,0 mode: CR/CR res: [0x200029afe:0x25:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x60000400000020 nid: 10.8.20.18@o2ib6 remote: 0x48a112d937df5e4a expref: 85 pid: 23589 timeout: 3722892 lvb_type: 3 Jul 31 14:17:12 fir-md1-s1 kernel: LustreError: 20729:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2521240400 ns: mdt-fir-MDT0000_UUID lock: ffff8f1fedbb7080/0x5d9ee6a2a71c18cb lrc: 1/0,0 mode: EX/EX res: [0x200029afe:0x25:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.20.18@o2ib6 remote: 0x48a112d937df5e9e expref: 16 pid: 20729 timeout: 0 lvb_type: 3 Jul 31 14:17:12 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f1b4df7b600 x1631774821924144/t437336486484(0) o101->ee2fc29b-a70b-6d11-2477-6e5c3f3348b3@10.8.20.18@o2ib6:3/0 lens 376/1568 e 1 to 0 dl 1564607823 ref 1 fl Complete:/0/0 rc -107/-107 Jul 31 14:18:26 fir-md1-s1 kernel: Lustre: 55551:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f06e0198450 x1631341008747024/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:1/0 lens 304/240 e 0 to 0 dl 1564607911 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:18:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 14:18:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 14:18:31 fir-md1-s1 kernel: LustreError: 55488:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f06e0198450 x1631341008747024/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:1/0 lens 304/240 e 0 to 0 dl 1564607911 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to edf08e2d-d0f9-f838-1112-b9395746d00f (at 10.8.20.18@o2ib6) Jul 31 14:20:41 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 31 14:20:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.25.8@o2ib6, removing former export from same NID Jul 31 14:20:41 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 14:26:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 769d013d-f990-3399-dde8-f67f737a957d (at 10.8.7.25@o2ib6) reconnecting Jul 31 14:26:38 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 31 14:27:23 fir-md1-s1 kernel: Lustre: 55550:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0fe7161050 x1631341008813344/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:27/0 lens 304/240 e 1 to 0 dl 1564608447 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:27:27 fir-md1-s1 kernel: LustreError: 55491:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f0fe7161050 x1631341008813344/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:27/0 lens 304/240 e 1 to 0 dl 1564608447 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:29:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 14:29:04 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 14:29:22 fir-md1-s1 kernel: Lustre: 55538:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c46470050 x1631341008826400/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:27/0 lens 304/240 e 0 to 0 dl 1564608567 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:29:27 fir-md1-s1 kernel: LustreError: 55550:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1c46470050 x1631341008826400/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:27/0 lens 304/240 e 0 to 0 dl 1564608567 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:31:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.25@o2ib6, removing former export from same NID Jul 31 14:31:02 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 31 14:31:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c99924b3-32ea-2e12-d25f-b7eb0a477991 (at 10.8.7.25@o2ib6) Jul 31 14:31:02 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 31 14:34:59 fir-md1-s1 kernel: Lustre: 23756:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e5f01da00 x1637988433154832/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:4/0 lens 480/568 e 1 to 0 dl 1564608904 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:35:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.24@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0a26fba400/0x5d9ee6a2af198e72 lrc: 3/0,0 mode: PW/PW res: [0x2c002c57b:0x145c4:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.8.24@o2ib6 remote: 0x492f4229fc036179 expref: 3213 pid: 23558 timeout: 3723973 lvb_type: 0 Jul 31 14:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ee2fc29b-a70b-6d11-2477-6e5c3f3348b3 (at 10.8.20.18@o2ib6) reconnecting Jul 31 14:39:04 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Jul 31 14:39:40 fir-md1-s1 kernel: Lustre: 55550:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23e8040050 x1631539266397584/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:15/0 lens 304/240 e 1 to 0 dl 1564609185 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:39:45 fir-md1-s1 kernel: LustreError: 55555:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f23e8040050 x1631539266397584/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:15/0 lens 304/240 e 1 to 0 dl 1564609185 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:40:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.27.16@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 14:40:42 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 14:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 00eb2007-a588-b422-45b7-3483c5c8a03a (at 10.8.25.8@o2ib6) Jul 31 14:42:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 14:44:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.25@o2ib6, removing former export from same NID Jul 31 14:44:17 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 14:49:12 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564609745/real 1564609745] req@ffff8f1f88ba1e00 x1636752154160368/t0(0) o104->fir-MDT0000@10.8.18.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564609752 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 14:49:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8bcbb71f-dec9-01fd-fa31-3d32f5a62a50 (at 10.8.8.23@o2ib6) reconnecting Jul 31 14:49:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 14:49:12 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jul 31 14:49:20 fir-md1-s1 kernel: Lustre: 21453:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bdb1e0850 x1638786425115904/t0(0) o4->64cd7216-d693-ed6b-ee4d-6e372402c9ad@10.8.27.6@o2ib6:25/0 lens 488/448 e 1 to 0 dl 1564609765 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 14:49:25 fir-md1-s1 kernel: LustreError: 35237:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f06b6f98450 x1631842612307824/t0(0) o4->1f5f304a-0842-58d3-d8f8-5b700ac24fca@10.8.20.26@o2ib6:9/0 lens 504/448 e 1 to 0 dl 1564609779 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:49:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 1f5f304a-0842-58d3-d8f8-5b700ac24fca (at 10.8.20.26@o2ib6), client will retry: rc = -110 Jul 31 14:49:25 fir-md1-s1 kernel: LustreError: 27604:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1bdb1e0850 x1638786425115904/t0(0) o4->64cd7216-d693-ed6b-ee4d-6e372402c9ad@10.8.27.6@o2ib6:25/0 lens 488/448 e 1 to 0 dl 1564609765 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:49:32 fir-md1-s1 kernel: LustreError: 21006:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f4009ea6000 x1639983775822400/t0(0) o37->baaf9aa6-d6ac-d219-ff91-f47dd67dd412@10.8.29.6@o2ib6:2/0 lens 448/440 e 1 to 0 dl 1564609772 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:49:32 fir-md1-s1 kernel: Lustre: 25680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564609765/real 1564609765] req@ffff8f2eaf1f6f00 x1636752156174736/t0(0) o104->fir-MDT0002@10.8.8.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564609772 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 14:49:32 fir-md1-s1 kernel: Lustre: 25680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 31 14:50:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0c07312f40/0x5d9ee6a2b11af036 lrc: 3/0,0 mode: PR/PR res: [0x2c002c6a9:0xe9:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.21@o2ib6 remote: 0xe913e4ad0d15c5e2 expref: 1457 pid: 23612 timeout: 3724879 lvb_type: 0 Jul 31 14:50:20 fir-md1-s1 kernel: LustreError: 117779:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f32aaa93600 x1636752165059840/t0(0) o105->fir-MDT0002@10.8.8.21@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 14:50:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 31 14:50:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.204@o2ib7 (106): c: 8, oc: 0, rc: 8 Jul 31 14:51:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 14:51:11 fir-md1-s1 kernel: LustreError: Skipped 60 previous similar messages Jul 31 14:52:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c99924b3-32ea-2e12-d25f-b7eb0a477991 (at 10.8.7.25@o2ib6) Jul 31 14:52:31 fir-md1-s1 kernel: Lustre: Skipped 538 previous similar messages Jul 31 14:53:48 fir-md1-s1 kernel: LustreError: 55538:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f164b048050 x1631539266498880/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:18/0 lens 304/240 e 1 to 0 dl 1564610028 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:54:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9c540990-8457-458f-eb50-06c483166dd3 (at 10.8.8.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4502b1d400, cur 1564610051 expire 1564609901 last 1564609824 Jul 31 14:54:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.25@o2ib6, removing former export from same NID Jul 31 14:54:38 fir-md1-s1 kernel: Lustre: Skipped 185 previous similar messages Jul 31 14:54:39 fir-md1-s1 kernel: Lustre: 26256:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564610072/real 1564610072] req@ffff8f1b6aa2b600 x1636752172228176/t0(0) o104->fir-MDT0000@10.8.18.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564610079 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 14:54:50 fir-md1-s1 kernel: LustreError: 44036:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1dfc9eb050 x1631809749701680/t0(0) o4->13061d85-51ac-4b0f-0a27-af4e7a3825e8@10.8.22.3@o2ib6:13/0 lens 504/448 e 0 to 0 dl 1564610113 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 13061d85-51ac-4b0f-0a27-af4e7a3825e8 (at 10.8.22.3@o2ib6), client will retry: rc = -110 Jul 31 14:54:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 14:54:53 fir-md1-s1 kernel: Lustre: 23753:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564610086/real 1564610086] req@ffff8f282a0de900 x1636752172528320/t0(0) o104->fir-MDT0002@10.8.8.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564610093 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 14:55:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1089c206c0/0x5d9ee6a2b18ac120 lrc: 3/0,0 mode: PR/PR res: [0x200029ecb:0x308:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81778350be22 expref: 255981 pid: 25676 timeout: 3725177 lvb_type: 0 Jul 31 14:55:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.19@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f3468b79f80/0x5d9ee6a2b4c5ad21 lrc: 3/0,0 mode: PR/PR res: [0x2c002c71e:0x25ff:0x0].0x0 bits 0x1b/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.8.19@o2ib6 remote: 0x1554fb9a75b5f113 expref: 1159 pid: 23756 timeout: 3725178 lvb_type: 0 Jul 31 14:55:29 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2ef11d5e80/0x5d9ee6a2b39f9867 lrc: 3/0,0 mode: PR/PR res: [0x2c002be2c:0xa10b:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.8.17@o2ib6 remote: 0x68316722d2aeda4e expref: 1086 pid: 20734 timeout: 3725189 lvb_type: 0 Jul 31 14:55:31 fir-md1-s1 kernel: LustreError: 83752:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2dbd435700 x1636752172827248/t0(0) o105->fir-MDT0002@10.8.17.26@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 14:55:40 fir-md1-s1 kernel: LustreError: 55538:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f3225b7a850 x1631361482739856/t0(0) o256->9b66bf74-4165-ae6a-63b8-3cf80fe40a18@10.8.20.27@o2ib6:10/0 lens 304/240 e 0 to 0 dl 1564610140 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 14:56:01 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f16b9a02f40/0x5d9ee6a2b177bb65 lrc: 3/0,0 mode: PR/PR res: [0x200029ec5:0x248:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a8177835025c6 expref: 107034 pid: 50446 timeout: 3725221 lvb_type: 0 Jul 31 14:56:01 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 31 14:56:18 fir-md1-s1 kernel: LustreError: 24587:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1564610088, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2428e17500/0x5d9ee6a2b4d7eee3 lrc: 3/0,1 mode: --/PW res: [0x200029ecb:0x308:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24587 timeout: 0 lvb_type: 0 Jul 31 14:56:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 31 14:56:18 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (106): c: 8, oc: 0, rc: 8 Jul 31 14:56:35 fir-md1-s1 kernel: LustreError: 97664:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f22bb324b00 x1636752172970720/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 14:56:35 fir-md1-s1 kernel: LustreError: 97664:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Jul 31 14:57:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f16c4004380/0x5d9ee6a2b0310a1a lrc: 3/0,0 mode: PR/PR res: [0x20000fb1a:0x6ec:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81778300bf8d expref: 23969 pid: 21429 timeout: 3725284 lvb_type: 0 Jul 31 14:57:09 fir-md1-s1 kernel: LustreError: 23706:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f08b406c200 x1636752173058544/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 14:57:09 fir-md1-s1 kernel: LustreError: 23706:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 31 14:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 92201019-2a0e-37b3-944e-b91d23afff01 (at 10.8.17.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e03e52000, cur 1564610364 expire 1564610214 last 1564610137 Jul 31 14:59:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 14:59:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06d24d01-86fd-11a8-6dcf-d16043d84c98 (at 10.8.25.8@o2ib6) reconnecting Jul 31 14:59:36 fir-md1-s1 kernel: Lustre: Skipped 743 previous similar messages Jul 31 15:01:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.7.25@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:01:12 fir-md1-s1 kernel: LustreError: Skipped 52 previous similar messages Jul 31 15:02:48 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564610561/real 1564610561] req@ffff8f29de7e9200 x1636752187603360/t0(0) o104->fir-MDT0000@10.8.18.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564610568 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:02:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 4ab591ab-ce2d-7651-671d-8ba61476cefb (at 10.8.16.3@o2ib6) Jul 31 15:02:48 fir-md1-s1 kernel: Lustre: Skipped 581 previous similar messages Jul 31 15:02:55 fir-md1-s1 kernel: LustreError: 46590:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f19fc8d1c50 x1631596353952224/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1564610598 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:02:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 31 15:02:57 fir-md1-s1 kernel: LustreError: 71843:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f31d3ccb450 x1640305483733648/t0(0) o37->08221c4d-680b-0eb0-dfa4-ec6a7d978740@10.8.9.9@o2ib6:6/0 lens 448/440 e 1 to 0 dl 1564610586 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:02:57 fir-md1-s1 kernel: LustreError: 71843:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 31 15:02:59 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f268f63b850 x1632262396758736/t0(0) o4->d0718430-19b7-83a1-60a9-a08aa6574bf3@10.8.11.11@o2ib6:4/0 lens 504/448 e 1 to 0 dl 1564610584 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 15:02:59 fir-md1-s1 kernel: Lustre: 21538:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Jul 31 15:03:02 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f214463fc50 x1631596353952112/t0(0) o3->e5fcc30b-a575-210f-f263-a974ce8eedc2@10.8.16.3@o2ib6:15/0 lens 488/440 e 1 to 0 dl 1564610595 ref 1 fl Interpret:/2/0 rc 0/0 Jul 31 15:03:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 31 15:03:04 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f268f63b850 x1632262396758736/t0(0) o4->d0718430-19b7-83a1-60a9-a08aa6574bf3@10.8.11.11@o2ib6:4/0 lens 504/448 e 1 to 0 dl 1564610584 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d0718430-19b7-83a1-60a9-a08aa6574bf3 (at 10.8.11.11@o2ib6), client will retry: rc = -110 Jul 31 15:03:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ff3e4c3f-1c01-9265-607f-2ed1a0a98c7a (at 10.8.30.2@o2ib6), client will retry: rc = -110 Jul 31 15:03:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 31 15:03:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a501b92b-e7b6-1a0d-e95a-8363a690f102 (at 10.8.11.28@o2ib6), client will retry: rc = -110 Jul 31 15:03:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 15:03:17 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2af965c850 x1638889578455120/t0(0) o3->97a561e8-9c27-c149-cdf8-264b680ede23@10.8.28.12@o2ib6:29/0 lens 488/440 e 1 to 0 dl 1564610609 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:03:17 fir-md1-s1 kernel: LustreError: 22958:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 15:03:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 31 15:03:20 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f2359868050 x1631541239820336/t0(0) o4->a4b592da-2625-e33f-35a6-c499a30b25eb@10.8.12.22@o2ib6:20/0 lens 504/448 e 1 to 0 dl 1564610600 ref 1 fl Interpret:/2/0 rc 0/0 Jul 31 15:03:20 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 11 previous similar messages Jul 31 15:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 2da7ed9b-a80c-b1ee-6b0b-514ba4c7a01e (at 10.8.30.32@o2ib6), client will retry: rc = -110 Jul 31 15:03:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 15:03:25 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564610598/real 1564610598] req@ffff8f286fcf0c00 x1636752187806592/t0(0) o104->fir-MDT0000@10.8.15.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564610605 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:03:25 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Jul 31 15:03:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f0c29e6b3c0/0x5d9ee6a2b7811fa0 lrc: 3/0,0 mode: PW/PW res: [0x200029d4c:0x3bd0:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.6@o2ib6 remote: 0x4cd70d72c4df897e expref: 8679 pid: 23660 timeout: 3725666 lvb_type: 0 Jul 31 15:03:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jul 31 15:03:27 fir-md1-s1 kernel: LustreError: 97670:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2ecd022c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f1a419a21c0/0x5d9ee6a2b78d3ed0 lrc: 3/0,0 mode: PW/PW res: [0x200029d4c:0x3bd0:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.8.27.6@o2ib6 remote: 0x4cd70d72c4df89af expref: 5425 pid: 97670 timeout: 0 lvb_type: 0 Jul 31 15:03:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 31 15:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 88ec999f-c6f4-0281-c377-b70d1594553b (at 10.8.12.29@o2ib6), client will retry: rc = -110 Jul 31 15:03:45 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Jul 31 15:03:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 31 15:03:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 15:03:47 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1eda758050 x1640612854083232/t0(0) o3->7cf3030e-db33-f207-7475-35dcd140568b@10.8.17.26@o2ib6:9/0 lens 488/440 e 0 to 0 dl 1564610649 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:03:47 fir-md1-s1 kernel: LustreError: 21453:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 15:03:53 fir-md1-s1 kernel: LustreError: 55550:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2fa9f58850 x1631576809958272/t0(0) o256->65ff3bb4-90ff-8a06-f846-6eb8a70a7d0e@10.8.8.11@o2ib6:23/0 lens 304/240 e 0 to 0 dl 1564610633 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:03:53 fir-md1-s1 kernel: LustreError: 55550:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 21 previous similar messages Jul 31 15:04:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with cb7e57fe-8059-4e8c-8618-95db5afecaa6 (at 10.8.24.21@o2ib6), client will retry: rc = -110 Jul 31 15:04:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 15:04:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.15.9@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0f4e0db600/0x5d9ee6a2b744ef88 lrc: 3/0,0 mode: PR/PR res: [0x2c002bdde:0xc00c:0x0].0x0 bits 0x13/0x0 rrc: 50 type: IBT flags: 0x60200400000020 nid: 10.8.15.9@o2ib6 remote: 0x4b2b7f7b65972510 expref: 5821 pid: 23691 timeout: 3725721 lvb_type: 0 Jul 31 15:04:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Jul 31 15:04:22 fir-md1-s1 kernel: LustreError: 22059:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f313c1af050 x1633757293621104/t0(0) o4->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:19/0 lens 488/448 e 0 to 0 dl 1564610689 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:04:22 fir-md1-s1 kernel: LustreError: 22059:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 31 15:04:23 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f28ca3b5700 x1636450661945632/t355621168043(0) o36->59f098aa-fb21-8ed8-84bd-d0ce06cad654@10.9.102.46@o2ib4:22/0 lens 520/416 e 0 to 0 dl 1564610662 ref 1 fl Complete:/0/0 rc 0/0 Jul 31 15:04:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with e5fcc30b-a575-210f-f263-a974ce8eedc2 (at 10.8.16.3@o2ib6), client will retry: rc -110 Jul 31 15:04:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 15:04:32 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564610665/real 1564610665] req@ffff8f2defacc500 x1636752188473344/t0(0) o104->fir-MDT0002@10.8.17.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564610672 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:04:32 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jul 31 15:04:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.8.3@o2ib6, removing former export from same NID Jul 31 15:04:39 fir-md1-s1 kernel: Lustre: Skipped 862 previous similar messages Jul 31 15:09:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 15:09:40 fir-md1-s1 kernel: Lustre: Skipped 1604 previous similar messages Jul 31 15:12:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:12:33 fir-md1-s1 kernel: LustreError: Skipped 739 previous similar messages Jul 31 15:12:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 15:12:48 fir-md1-s1 kernel: Lustre: Skipped 2406 previous similar messages Jul 31 15:12:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Jul 31 15:12:52 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (106): c: 8, oc: 0, rc: 8 Jul 31 15:15:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 15:15:49 fir-md1-s1 kernel: Lustre: Skipped 121 previous similar messages Jul 31 15:20:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 15:20:10 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 31 15:21:17 fir-md1-s1 kernel: Lustre: 21456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564611670/real 1564611670] req@ffff8f1df341d700 x1636752312549280/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564611677 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:22:25 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564611738/real 1564611738] req@ffff8f33a1192d00 x1636752315510144/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564611745 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:22:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:23:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 15:23:02 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Jul 31 15:26:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 15:26:18 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 15:28:59 fir-md1-s1 kernel: Lustre: 20555:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564612132/real 1564612132] req@ffff8f24d00c4800 x1636752331713024/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564612139 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:30:10 fir-md1-s1 kernel: Lustre: 23742:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564612203/real 1564612203] req@ffff8f2ea3e63300 x1636752334324432/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564612210 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 15:30:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 15:30:31 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 15:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2c8f2b1c00, cur 1564612362 expire 1564612212 last 1564612135 Jul 31 15:32:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 15:33:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 15:33:04 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Jul 31 15:34:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:34:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 15:36:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 15:36:47 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Jul 31 15:40:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 15:40:37 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 15:43:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 15:43:04 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 15:46:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 15:46:54 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 15:47:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:47:19 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 15:47:35 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1bdb1e4450 x1631839968738976/t0(0) o4->534d8def-8eaf-62e4-3c03-cd3608d37c89@10.8.21.13@o2ib6:19/0 lens 504/448 e 1 to 0 dl 1564613269 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:47:35 fir-md1-s1 kernel: LustreError: 46534:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 15:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 534d8def-8eaf-62e4-3c03-cd3608d37c89 (at 10.8.21.13@o2ib6), client will retry: rc = -110 Jul 31 15:47:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 15:47:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 97a561e8-9c27-c149-cdf8-264b680ede23 (at 10.8.28.12@o2ib6), client will retry: rc -110 Jul 31 15:47:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 15:47:43 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f311c7d6300 x1639440535584416/t0(0) o36->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:18/0 lens 488/2888 e 1 to 0 dl 1564613268 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 15:47:43 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 111 previous similar messages Jul 31 15:47:48 fir-md1-s1 kernel: LustreError: 21890:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1a52e01850 x1638248161539472/t0(0) o37->83b4afa2-a367-a71c-8602-481ad43297ce@10.8.0.68@o2ib6:18/0 lens 448/440 e 1 to 0 dl 1564613268 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:47:48 fir-md1-s1 kernel: LustreError: 21890:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 68 previous similar messages Jul 31 15:47:49 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f311c7d6300 x1639440535584416/t355628065103(0) o36->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:18/0 lens 488/424 e 1 to 0 dl 1564613268 ref 1 fl Complete:/0/0 rc 0/0 Jul 31 15:47:55 fir-md1-s1 kernel: LustreError: 24566:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2b6371c850 x1638719556719216/t0(0) o4->6b5a58e8-f6cd-7144-fe7f-c8e072c14f3d@10.8.22.25@o2ib6:9/0 lens 504/448 e 1 to 0 dl 1564613289 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:47:55 fir-md1-s1 kernel: LustreError: 24566:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 15:47:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6b5a58e8-f6cd-7144-fe7f-c8e072c14f3d (at 10.8.22.25@o2ib6), client will retry: rc = -110 Jul 31 15:47:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 15:48:03 fir-md1-s1 kernel: LustreError: 29830:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f344f91ac50 x1631774823373216/t0(0) o4->ee2fc29b-a70b-6d11-2477-6e5c3f3348b3@10.8.20.18@o2ib6:3/0 lens 504/448 e 1 to 0 dl 1564613283 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 15:50:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 15:50:47 fir-md1-s1 kernel: Lustre: Skipped 424 previous similar messages Jul 31 15:53:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 15:53:31 fir-md1-s1 kernel: Lustre: Skipped 695 previous similar messages Jul 31 15:56:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 15:56:58 fir-md1-s1 kernel: Lustre: Skipped 283 previous similar messages Jul 31 15:57:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 15:57:31 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 16:00:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 16:00:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 16:04:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 16:04:42 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 31 16:07:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Jul 31 16:07:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 16:09:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 16:09:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 16:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f191d009-08d1-2ed7-450f-b5fd9785f522 (at 10.8.24.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2507b0c000, cur 1564614614 expire 1564614464 last 1564614387 Jul 31 16:11:20 fir-md1-s1 kernel: Lustre: 20383:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f9c5e8450 x1631539267086992/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:25/0 lens 304/240 e 1 to 0 dl 1564614685 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 16:11:20 fir-md1-s1 kernel: Lustre: 20383:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Jul 31 16:11:25 fir-md1-s1 kernel: LustreError: 55552:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2f9c5e8450 x1631539267086992/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:25/0 lens 304/240 e 1 to 0 dl 1564614685 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:12:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06d24d01-86fd-11a8-6dcf-d16043d84c98 (at 10.8.25.8@o2ib6) reconnecting Jul 31 16:12:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 16:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9b66bf74-4165-ae6a-63b8-3cf80fe40a18 (at 10.8.20.27@o2ib6) Jul 31 16:15:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 16:17:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.25.8@o2ib6, removing former export from same NID Jul 31 16:17:59 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Jul 31 16:18:30 fir-md1-s1 kernel: Lustre: 55544:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d0cadac50 x1631341009864272/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:5/0 lens 304/240 e 0 to 0 dl 1564615115 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 16:18:35 fir-md1-s1 kernel: LustreError: 55552:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2d0cadac50 x1631341009864272/t0(0) o256->00eb2007-a588-b422-45b7-3483c5c8a03a@10.8.25.8@o2ib6:5/0 lens 304/240 e 0 to 0 dl 1564615115 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:20:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 16:20:53 fir-md1-s1 kernel: LustreError: Skipped 14 previous similar messages Jul 31 16:21:33 fir-md1-s1 kernel: Lustre: 27318:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564615286/real 1564615286] req@ffff8f2f0e891500 x1636752427074064/t0(0) o106->fir-MDT0000@10.8.18.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564615293 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 16:21:33 fir-md1-s1 kernel: LustreError: 20471:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1d822e3050 x1636518397828320/t0(0) o37->3429bec6-fe2a-19ec-4f0c-bb576fed4ff4@10.8.29.4@o2ib6:27/0 lens 448/440 e 0 to 0 dl 1564615317 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:21:33 fir-md1-s1 kernel: LustreError: 20471:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 31 16:21:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Jul 31 16:21:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 16:21:36 fir-md1-s1 kernel: LustreError: 24568:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2cb466a450 x1633731076187024/t0(0) o4->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:29/0 lens 488/448 e 0 to 0 dl 1564615319 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:21:36 fir-md1-s1 kernel: LustreError: 24568:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 7 previous similar messages Jul 31 16:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Jul 31 16:21:44 fir-md1-s1 kernel: Lustre: 71821:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a77636000 x1639994755232688/t0(0) o37->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:19/0 lens 448/440 e 1 to 0 dl 1564615309 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 16:21:49 fir-md1-s1 kernel: LustreError: 71862:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2a77636000 x1639994755232688/t0(0) o37->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:19/0 lens 448/440 e 1 to 0 dl 1564615309 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:22:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0492f6eb-1534-1b6f-6c14-8d5d26a98b60 (at 10.8.10.35@o2ib6) reconnecting Jul 31 16:22:22 fir-md1-s1 kernel: Lustre: Skipped 385 previous similar messages Jul 31 16:22:53 fir-md1-s1 kernel: Lustre: 21412:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d9a943000 x1631857393594288/t0(0) o101->1dccfe10-92fc-f925-ce99-469da8f9fab0@10.8.8.19@o2ib6:28/0 lens 480/568 e 0 to 0 dl 1564615378 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 16:22:53 fir-md1-s1 kernel: Lustre: 21412:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jul 31 16:23:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 31 16:23:13 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (107): c: 8, oc: 0, rc: 8 Jul 31 16:25:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 16:25:40 fir-md1-s1 kernel: Lustre: Skipped 611 previous similar messages Jul 31 16:26:56 fir-md1-s1 kernel: LustreError: 29831:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f325360cc50 x1631770217163904/t0(0) o4->bf3478cc-569b-5c14-1a71-20ca1e1f08aa@10.8.12.12@o2ib6:25/0 lens 488/448 e 0 to 0 dl 1564615645 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 16:26:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6), client will retry: rc = -110 Jul 31 16:26:56 fir-md1-s1 kernel: LustreError: 29831:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 16:31:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 16:31:23 fir-md1-s1 kernel: Lustre: Skipped 187 previous similar messages Jul 31 16:32:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 16:32:58 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 31 16:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23ab95c800, cur 1564616093 expire 1564615943 last 1564615866 Jul 31 16:34:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 16:36:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e9e20a98-f46e-41a9-d359-d2738459757c (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f727e8c00, cur 1564616168 expire 1564616018 last 1564615941 Jul 31 16:36:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 16:36:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 16:36:19 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 16:43:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 16:43:47 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 16:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 16:44:29 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 16:46:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 16:46:24 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 16:54:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 16:54:04 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 16:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 16:54:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 16:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 16:56:27 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Jul 31 16:56:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 16:56:59 fir-md1-s1 kernel: LustreError: Skipped 31 previous similar messages Jul 31 17:02:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:04:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 17:04:05 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 31 17:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 17:04:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 17:06:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 17:06:57 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 17:08:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 59f5c312-adc4-b4a9-05e0-8c37d188c47f (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f92b08000, cur 1564618120 expire 1564617970 last 1564617893 Jul 31 17:08:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 17:08:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:10:03 fir-md1-s1 kernel: Lustre: 55551:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f05495b6450 x1631539267542912/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:8/0 lens 304/240 e 1 to 0 dl 1564618208 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 17:10:08 fir-md1-s1 kernel: LustreError: 20383:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f05495b6450 x1631539267542912/t0(0) o256->c99924b3-32ea-2e12-d25f-b7eb0a477991@10.8.7.25@o2ib6:8/0 lens 304/240 e 1 to 0 dl 1564618208 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:10:08 fir-md1-s1 kernel: LustreError: 20383:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 31 17:13:43 fir-md1-s1 kernel: LustreError: 20974:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f165372ef00 x1640305525129392/t0(0) o37->08221c4d-680b-0eb0-dfa4-ec6a7d978740@10.8.9.9@o2ib6:26/0 lens 448/440 e 1 to 0 dl 1564618436 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:43 fir-md1-s1 kernel: LustreError: 20974:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Jul 31 17:13:44 fir-md1-s1 kernel: LustreError: 71863:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f312686c800 x1633757295951680/t0(0) o37->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:6/0 lens 448/440 e 0 to 0 dl 1564618446 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:44 fir-md1-s1 kernel: LustreError: 71863:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 17:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc = -110 Jul 31 17:13:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 31 17:13:46 fir-md1-s1 kernel: LustreError: 20471:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f198b6d6300 x1636518398479552/t0(0) o37->3429bec6-fe2a-19ec-4f0c-bb576fed4ff4@10.8.29.4@o2ib6:29/0 lens 448/440 e 1 to 0 dl 1564618439 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:46 fir-md1-s1 kernel: LustreError: 20471:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Jul 31 17:13:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Jul 31 17:13:50 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564618423/real 1564618423] req@ffff8f16c108ef00 x1636752452378016/t0(0) o104->fir-MDT0000@10.8.27.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564618430 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Jul 31 17:13:50 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jul 31 17:13:50 fir-md1-s1 kernel: LustreError: 20974:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1df52b8600 x1637405757947728/t0(0) o37->65c7cbb7-edd7-61f5-c144-1ffbb9efedd7@10.8.1.35@o2ib6:14/0 lens 448/440 e 0 to 0 dl 1564618454 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:50 fir-md1-s1 kernel: LustreError: 20974:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Jul 31 17:13:51 fir-md1-s1 kernel: Lustre: 46568:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a3da58450 x1634623366583056/t0(0) o4->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1564618436 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:56 fir-md1-s1 kernel: LustreError: 69438:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1a3da58450 x1634623366583056/t0(0) o4->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1564618436 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:13:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Jul 31 17:13:57 fir-md1-s1 kernel: Lustre: 10585:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564618430/real 1564618430] req@ffff8f27b5ed6000 x1636752452389120/t0(0) o106->fir-MDT0002@10.8.28.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564618437 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Jul 31 17:13:57 fir-md1-s1 kernel: Lustre: 10585:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jul 31 17:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 31 17:13:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 17:14:00 fir-md1-s1 kernel: LustreError: 21013:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f3491ffd100 x1640215892868544/t0(0) o37->296d97ff-0de3-b3eb-25b6-28238cfb0a2e@10.8.9.8@o2ib6:0/0 lens 448/440 e 1 to 0 dl 1564618440 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:14:02 fir-md1-s1 kernel: Lustre: 97671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564618435/real 1564618435] req@ffff8f18b8b44500 x1636752452419808/t0(0) o106->fir-MDT0002@10.8.8.24@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564618442 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 17:14:04 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1a3da5f450 x1638888277515488/t0(0) o3->efb86e40-78e4-0377-026b-476ce03a25a4@10.8.28.1@o2ib6:2/0 lens 488/440 e 0 to 0 dl 1564618472 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:14:04 fir-md1-s1 kernel: LustreError: 27602:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Jul 31 17:14:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with efb86e40-78e4-0377-026b-476ce03a25a4 (at 10.8.28.1@o2ib6), client will retry: rc -110 Jul 31 17:14:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 62873e5a-5401-394e-2139-5fd47462d1df (at 10.8.29.2@o2ib6), client will retry: rc = -110 Jul 31 17:14:07 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 31 17:14:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Jul 31 17:14:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:14:07 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Jul 31 17:14:07 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Jul 31 17:14:10 fir-md1-s1 kernel: LustreError: 29830:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f313c1adc50 x1635718553560928/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:10/0 lens 488/440 e 1 to 0 dl 1564618450 ref 1 fl Interpret:/2/0 rc 0/0 Jul 31 17:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 666b60d6-ed92-c98b-c78c-4bfc3f3e7231 (at 10.8.16.2@o2ib6), client will retry: rc -110 Jul 31 17:14:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Jul 31 17:14:10 fir-md1-s1 kernel: LustreError: 29830:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Jul 31 17:14:15 fir-md1-s1 kernel: Lustre: 27021:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d822e7c50 x1640305525130768/t0(0) o37->08221c4d-680b-0eb0-dfa4-ec6a7d978740@10.8.9.9@o2ib6:20/0 lens 448/440 e 0 to 0 dl 1564618460 ref 2 fl Interpret:/2/0 rc 0/0 Jul 31 17:14:15 fir-md1-s1 kernel: Lustre: 27021:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Jul 31 17:14:20 fir-md1-s1 kernel: LustreError: 27059:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1d822e7c50 x1640305525130768/t0(0) o37->08221c4d-680b-0eb0-dfa4-ec6a7d978740@10.8.9.9@o2ib6:20/0 lens 448/440 e 0 to 0 dl 1564618460 ref 1 fl Interpret:/2/0 rc 0/0 Jul 31 17:14:20 fir-md1-s1 kernel: LustreError: 27059:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Jul 31 17:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d1270a98-b38a-ceeb-2b47-7d833ab93d6e (at 10.8.25.5@o2ib6), client will retry: rc = -110 Jul 31 17:14:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 17:14:24 fir-md1-s1 kernel: Lustre: 21333:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564618457/real 1564618457] req@ffff8f336a6a8300 x1636752452497680/t0(0) o104->fir-MDT0002@10.8.8.19@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564618464 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 17:14:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f0b527cb3c0/0x5d9ee6a305eb0adc lrc: 3/0,0 mode: PR/PR res: [0x200029d54:0x1dd8e:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a817785a5f990 expref: 100758 pid: 23683 timeout: 3733524 lvb_type: 0 Jul 31 17:14:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Jul 31 17:14:26 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f19fc8d1c50 x1635718553560928/t0(0) o3->9dcf2f2b-339d-b96d-0792-e79b27f28969@10.8.28.2@o2ib6:1/0 lens 488/440 e 0 to 0 dl 1564618471 ref 1 fl Interpret:/2/0 rc 0/0 Jul 31 17:14:26 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 7 previous similar messages Jul 31 17:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 9dcf2f2b-339d-b96d-0792-e79b27f28969 (at 10.8.28.2@o2ib6), client will retry: rc -110 Jul 31 17:14:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 17:14:27 fir-md1-s1 kernel: LustreError: 23688:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0f7c3a6600 x1636752452529648/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Jul 31 17:14:27 fir-md1-s1 kernel: LustreError: 23688:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Jul 31 17:14:43 fir-md1-s1 kernel: LustreError: 55551:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f05495b4850 x1631322448500976/t0(0) o256->8b36b49e-1ae8-f1ed-ba47-3e9e02ce1996@10.8.26.24@o2ib6:13/0 lens 304/240 e 0 to 0 dl 1564618483 ref 1 fl Interpret:/0/0 rc 0/0 Jul 31 17:14:43 fir-md1-s1 kernel: LustreError: 55551:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 4 previous similar messages Jul 31 17:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ab8fb752-7566-0b9b-4be7-749799d2e5da (at 10.8.24.22@o2ib6) reconnecting Jul 31 17:14:44 fir-md1-s1 kernel: Lustre: Skipped 797 previous similar messages Jul 31 17:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 5ef45f19-459d-828d-fcff-ba0df2051c6a (at 10.8.15.8@o2ib6), client will retry: rc = -110 Jul 31 17:14:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 17:15:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jul 31 17:15:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (107): c: 8, oc: 0, rc: 8 Jul 31 17:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 17:16:58 fir-md1-s1 kernel: Lustre: Skipped 1553 previous similar messages Jul 31 17:24:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 17:24:55 fir-md1-s1 kernel: Lustre: Skipped 309 previous similar messages Jul 31 17:25:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 17:25:01 fir-md1-s1 kernel: Lustre: Skipped 463 previous similar messages Jul 31 17:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 17:27:00 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Jul 31 17:27:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:27:32 fir-md1-s1 kernel: LustreError: Skipped 311 previous similar messages Jul 31 17:34:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 17:34:57 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 31 17:35:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 17:35:29 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 17:37:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 17:37:26 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Jul 31 17:44:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 17:44:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 17:45:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:45:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 17:45:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 17:45:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 17:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 17:47:30 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 17:52:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16efb9c400, cur 1564620740 expire 1564620590 last 1564620513 Jul 31 17:52:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Jul 31 17:55:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 17:55:01 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 31 17:55:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 17:55:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 17:56:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 17:56:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 17:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 17:57:32 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Jul 31 17:58:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a34c72d9-ecf6-8fc7-45cb-c6030c1bcdd4 (at 10.9.106.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f197a822800, cur 1564621082 expire 1564620932 last 1564620855 Jul 31 17:59:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e5a2d60-6dab-7efb-843d-411242015e97 (at 10.8.23.11@o2ib6) in 165 seconds. I think it's dead, and I am evicting it. exp ffff8f2ed9631000, cur 1564621158 expire 1564621008 last 1564620993 Jul 31 17:59:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Jul 31 18:00:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 82ba3b55-be99-f4f8-db97-0342123f6f19 (at 10.8.23.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0b6f0f7800, cur 1564621220 expire 1564621070 last 1564620993 Jul 31 18:00:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Jul 31 18:00:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 51232b02-55d0-759d-f12a-dd4d25b5f158 (at 10.8.20.31@o2ib6) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8f1b0ad3fc00, cur 1564621234 expire 1564621084 last 1564621009 Jul 31 18:00:34 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Jul 31 18:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 18:05:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 18:06:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 18:06:57 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 18:07:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 18:07:44 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Jul 31 18:08:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 18:08:59 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Jul 31 18:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 18:15:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Jul 31 18:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 18:18:21 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Jul 31 18:19:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 18:19:13 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 18:19:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 18:19:21 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Jul 31 18:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 18:25:46 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 18:28:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 18:28:24 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Jul 31 18:30:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Jul 31 18:30:12 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 31 18:30:27 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19f8764500 x1638713957658912/t0(0) o101->746b3d8e-c221-65e3-9e0b-3d48071d79a2@10.9.0.81@o2ib4:2/0 lens 480/568 e 1 to 0 dl 1564623032 ref 2 fl Interpret:/0/0 rc 0/0 Jul 31 18:30:27 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Jul 31 18:30:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.82@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f34effa1b00/0x5d9ee6a31eac0f70 lrc: 3/0,0 mode: PW/PW res: [0x200029f49:0x1e7f:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.0.82@o2ib6 remote: 0xac353b34d6e84bef expref: 5992 pid: 23588 timeout: 3738101 lvb_type: 0 Jul 31 18:35:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 18:35:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 18:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 18:35:51 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 18:38:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 18:38:40 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Jul 31 18:40:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 18:40:33 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Jul 31 18:45:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 18:45:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 18:47:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 18:47:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 18:48:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 18:48:42 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 31 18:50:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 18:50:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 31 18:56:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 18:56:14 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 18:56:32 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:56:43 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:56:44 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:56:48 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:56:52 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:56:52 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Jul 31 18:57:06 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:57:06 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Jul 31 18:57:22 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Jul 31 18:57:22 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Jul 31 18:58:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 18:58:57 fir-md1-s1 kernel: Lustre: Skipped 137 previous similar messages Jul 31 18:59:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 18:59:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 19:01:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:01:42 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 19:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 19:06:14 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 19:09:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 19:09:03 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Jul 31 19:11:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 19:11:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 19:14:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:14:41 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 31 19:16:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 19:16:40 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 19:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 19:19:07 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 19:21:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 19:21:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 19:25:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:25:02 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 31 19:27:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 19:27:11 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 19:29:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 19:29:09 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Jul 31 19:34:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 19:34:05 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 19:36:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:36:11 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 31 19:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 19:37:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 19:39:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 19:39:10 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 31 19:45:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 19:45:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 19:46:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:46:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Jul 31 19:47:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 19:47:13 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Jul 31 19:49:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 19:49:18 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Jul 31 19:53:58 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564627437/real 1564627437] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564628038 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jul 31 19:53:58 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 19:57:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 19:57:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 19:57:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 19:57:42 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Jul 31 19:59:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 19:59:24 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Jul 31 19:59:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 19:59:24 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 20:04:01 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564628038/real 1564628038] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564628639 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:04:01 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:08:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 20:08:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Jul 31 20:08:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 20:08:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Jul 31 20:10:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 20:10:01 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Jul 31 20:10:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 20:10:01 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Jul 31 20:14:02 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564628641/real 1564628641] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564629242 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:14:02 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:18:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 20:18:09 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 20:19:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 20:19:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 20:20:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Jul 31 20:20:18 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 20:22:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 20:22:08 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Jul 31 20:24:03 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564629242/real 1564629242] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564629843 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:24:03 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:28:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 20:28:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 20:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 20:30:44 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 20:31:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 20:31:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 20:32:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 20:32:58 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Jul 31 20:34:04 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564629843/real 1564629843] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564630444 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:34:04 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:38:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 20:38:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 20:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 20:41:11 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 20:43:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 20:43:35 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 20:44:05 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564630444/real 1564630444] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564631045 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:44:05 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:49:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 20:49:01 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 20:51:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 20:51:31 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Jul 31 20:54:06 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564631045/real 1564631045] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564631646 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 20:54:06 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 20:55:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 20:55:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 20:57:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 20:57:58 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Jul 31 20:59:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 20:59:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 20:59:35 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 21:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Jul 31 21:01:33 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Jul 31 21:03:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 21:03:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 21:04:07 fir-md1-s1 kernel: Lustre: 20234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564631646/real 1564631646] req@ffff8f2ea65d2d00 x1636752524644656/t0(0) o6->fir-OST002a-osc-MDT0000@10.0.10.107@o2ib7:28/4 lens 544/432 e 24 to 1 dl 1564632247 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jul 31 21:04:07 fir-md1-s1 kernel: Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Jul 31 21:08:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 21:08:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Jul 31 21:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 21:09:44 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Jul 31 21:11:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 21:11:43 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 21:15:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 21:15:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 21:18:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 21:18:10 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 21:20:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 21:20:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 21:22:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 21:22:05 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 21:25:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 21:25:56 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Jul 31 21:28:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 21:28:34 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Jul 31 21:30:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Jul 31 21:30:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Jul 31 21:32:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 21:32:11 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Jul 31 21:36:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 21:36:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Jul 31 21:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 21:40:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Jul 31 21:41:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 21:41:46 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Jul 31 21:42:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 21:42:41 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Jul 31 21:48:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 21:48:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Jul 31 21:50:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 21:50:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 21:51:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 21:51:52 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Jul 31 21:52:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 21:52:47 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 22:00:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 22:00:56 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 22:01:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Jul 31 22:01:53 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Jul 31 22:02:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:02:50 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 31 22:04:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 22:04:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 22:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 22:11:09 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Jul 31 22:12:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 22:12:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Jul 31 22:13:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:13:04 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Jul 31 22:21:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 22:21:38 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Jul 31 22:21:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 22:21:56 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 31 22:22:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 22:22:03 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 22:23:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:23:14 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Jul 31 22:32:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 22:32:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 22:32:16 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Jul 31 22:32:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 22:33:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:33:22 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Jul 31 22:43:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 22:43:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 22:43:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:43:28 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Jul 31 22:43:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 22:43:28 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Jul 31 22:51:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 22:51:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 22:53:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 22:53:40 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Jul 31 22:53:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 22:53:41 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Jul 31 22:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 22:54:06 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Jul 31 22:57:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 23:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Jul 31 23:04:10 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 23:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Jul 31 23:04:10 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Jul 31 23:04:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 23:04:33 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 23:06:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 23:14:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 23:14:35 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Jul 31 23:14:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 23:14:35 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 23:14:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 23:14:55 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Jul 31 23:17:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 23:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 23:24:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Jul 31 23:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 23:24:37 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Jul 31 23:25:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 23:25:18 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Jul 31 23:34:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Jul 31 23:34:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Jul 31 23:34:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Jul 31 23:34:39 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Jul 31 23:35:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 23:35:25 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Jul 31 23:37:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 23:37:36 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Jul 31 23:44:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 23:44:42 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Jul 31 23:45:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Jul 31 23:45:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Jul 31 23:46:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Jul 31 23:46:30 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Jul 31 23:49:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Jul 31 23:49:18 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Jul 31 23:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Jul 31 23:54:43 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Jul 31 23:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Jul 31 23:56:00 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Jul 31 23:59:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Jul 31 23:59:49 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 00:01:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 00:01:26 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 00:04:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 00:04:47 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 00:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 00:06:04 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 00:09:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 00:09:50 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 01 00:15:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 00:15:04 fir-md1-s1 kernel: Lustre: Skipped 111 previous similar messages Aug 01 00:16:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 00:16:23 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 01 00:20:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 00:20:57 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 01 00:25:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 00:25:15 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 01 00:26:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 00:26:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 00:31:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 00:31:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 00:32:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 00:32:12 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 00:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 00:35:39 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 01 00:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 00:36:48 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 01 00:43:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 00:43:42 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 00:45:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 00:45:43 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 01 00:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 00:48:15 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 00:49:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 00:54:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 00:54:06 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 00:55:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 00:55:45 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 01 00:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 00:58:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 00:58:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 00:58:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:01:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:04:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 01:04:50 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 01 01:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:05:54 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 01 01:08:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 01:08:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 01:10:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:13:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:13:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 01:16:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:16:00 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 01:17:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 01:17:41 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 01:18:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 01:18:22 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 01:20:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:26:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:26:18 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 01 01:28:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 01:28:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 01:32:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 01:32:17 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 01 01:36:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:36:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 01 01:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 01:38:39 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 01 01:39:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:42:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 01:42:24 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 01 01:46:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:46:23 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 01:49:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 01:49:27 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 01:53:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 01:53:30 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 01:55:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 01:55:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 01:56:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 01:56:24 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 01:59:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 01:59:45 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 01 02:05:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 02:05:11 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 01 02:06:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 02:06:36 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 01 02:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 02:09:48 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 01 02:16:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 02:16:07 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 02:16:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f72946800, cur 1564650978 expire 1564650828 last 1564650751 Aug 01 02:16:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 01 02:16:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 02:16:39 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 01 02:20:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 02:20:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 01 02:20:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 02:20:40 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 02:27:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 02:27:05 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 02:27:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 02:27:21 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 01 02:28:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 02:30:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 02:30:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 02:32:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 02:37:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 02:37:07 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 02:37:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 02:37:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 02:40:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 02:40:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 02:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 02:41:35 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 02:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 02:47:26 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 01 02:50:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 02:50:59 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 01 02:51:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 02:51:24 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 02:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 02:52:02 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 02:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 02:58:38 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 03:01:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 03:01:45 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 01 03:03:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 03:03:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 03:03:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 03:03:29 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 03:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 03:08:53 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Aug 01 03:13:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 03:13:21 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 03:14:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 03:14:36 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 01 03:18:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 03:18:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 01 03:18:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 03:18:59 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 03:23:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 03:23:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 03:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 03:25:48 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 01 03:29:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 03:29:39 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 01 03:30:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 03:30:30 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 01 03:34:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 03:34:16 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 01 03:36:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 03:36:13 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 03:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 03:39:45 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 03:40:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 03:40:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 03:44:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 03:44:16 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 03:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 03:46:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 03:50:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 03:50:27 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 01 03:54:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 03:54:51 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 03:56:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 03:56:54 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 01 03:59:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18d41e0800, cur 1564657143 expire 1564656993 last 1564656916 Aug 01 04:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 04:00:46 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 04:03:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:03:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 04:05:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 04:05:27 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 01 04:08:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 04:08:11 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 04:10:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 04:10:47 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 04:11:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:11:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 04:15:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 04:15:43 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 04:17:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:18:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 04:18:15 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 01 04:21:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 04:21:56 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 01 04:23:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:23:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 04:28:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 04:28:17 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 04:29:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 04:29:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 01 04:32:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 04:32:16 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 04:38:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 04:38:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 04:39:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 04:39:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 04:39:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:39:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 01 04:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 04:42:30 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 04:48:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 04:48:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 04:49:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 04:49:34 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 04:52:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 04:52:38 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 04:55:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 04:55:13 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 01 04:59:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 04:59:11 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 01 05:00:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 05:00:02 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 01 05:02:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 05:02:55 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 01 05:10:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 05:10:03 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 01 05:11:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 05:11:16 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 05:12:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 05:12:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 05:13:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 05:13:10 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 05:20:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 05:20:39 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 05:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2027e649-8bcd-4ca1-6dcb-dd11dcd45e21 (at 10.9.101.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe54c400, cur 1564662103 expire 1564661953 last 1564661876 Aug 01 05:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 05:22:05 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 05:23:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 05:23:13 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 01 05:27:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 05:27:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 05:30:58 fir-md1-s1 kernel: LustreError: 21606:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.17@o2ib4 arrived at 1564662658 with bad export cookie 6746082777628365679 Aug 01 05:30:58 fir-md1-s1 kernel: LustreError: 21606:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1355 previous similar messages Aug 01 05:31:18 fir-md1-s1 kernel: LustreError: 27444:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.17@o2ib4 arrived at 1564662678 with bad export cookie 6746082289092235184 Aug 01 05:32:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 05:32:00 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 05:32:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 05:32:25 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 05:33:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 05:33:16 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 01 05:38:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 05:38:29 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 05:43:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 05:43:14 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 05:43:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 05:43:19 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 05:43:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 05:43:39 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 05:53:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 05:53:19 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 01 05:53:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 05:53:29 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 01 05:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 05:54:36 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 06:02:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:02:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 06:03:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 06:03:51 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 06:04:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 06:04:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 06:04:41 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 06:04:41 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 06:14:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 06:14:06 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 06:14:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 06:14:56 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 06:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 06:16:54 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 06:18:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:24:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 06:24:14 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 06:25:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 06:25:20 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 01 06:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 06:27:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 06:31:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:34:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 06:34:18 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 06:36:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 06:36:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 01 06:36:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 06:38:06 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 06:38:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:40:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:43:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:44:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 06:44:22 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 06:48:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 06:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 06:48:21 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 06:48:21 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 06:52:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:52:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 06:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 06:54:32 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 06:58:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 06:58:42 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 06:59:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 06:59:34 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 07:01:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 07:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 07:04:48 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 07:08:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 07:08:43 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 07:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 07:09:50 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 07:15:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 07:15:10 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 01 07:18:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 07:18:49 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 07:20:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 07:20:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 01 07:25:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 07:25:48 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 07:28:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 07:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 07:31:09 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 07:31:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 07:31:59 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 07:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 07:36:22 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 07:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 07:41:27 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 01 07:44:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 07:44:07 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 01 07:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 07:46:28 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 01 07:51:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 07:51:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 07:53:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 07:54:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 07:54:59 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 07:56:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 07:57:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 07:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 07:57:30 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 01 07:59:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:01:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 08:01:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 08:02:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:03:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:06:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 08:06:35 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 08:08:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 08:08:12 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 08:08:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:08:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 08:11:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 08:11:55 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 01 08:12:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:12:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 08:18:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 08:18:04 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 08:19:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 08:19:35 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 08:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 08:22:14 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 08:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 08:29:41 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 08:30:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 08:30:06 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 01 08:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 08:32:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 08:33:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:33:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 08:40:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 08:40:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 08:40:13 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 08:40:13 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 08:42:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 08:42:26 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 08:49:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 08:49:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 08:51:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 08:51:45 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 08:51:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 08:51:45 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 08:52:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 08:52:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 09:00:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 09:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 09:01:47 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 01 09:01:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 09:01:51 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 09:03:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 09:03:12 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 09:11:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 01 09:11:54 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 01 09:11:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 09:11:54 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 09:13:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 09:13:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 01 09:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f312a0ee400, cur 1564676061 expire 1564675911 last 1564675834 Aug 01 09:14:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 09:22:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 09:22:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 01 09:22:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 09:22:02 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 01 09:23:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 09:23:50 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 09:32:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 09:32:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 01 09:33:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 09:33:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 09:34:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 09:34:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 09:39:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 09:39:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 09:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 09:42:11 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 01 09:44:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 09:44:12 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 01 09:44:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 09:44:33 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 01 09:49:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d0c7653-0e3a-41fb-95c2-ae0301d2a3b3 (at 10.9.114.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3766e66000, cur 1564678190 expire 1564678040 last 1564677963 Aug 01 09:52:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 09:52:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 09:52:34 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 09:54:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 09:54:13 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 09:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 09:54:49 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 09:58:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 10:02:43 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 10:04:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 10:04:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 10:05:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 10:05:17 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 01 10:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1da7f1a000, cur 1564679360 expire 1564679210 last 1564679133 Aug 01 10:09:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 10:12:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 10:12:46 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 10:13:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 10:14:58 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 10:15:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 10:15:34 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 01 10:16:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:17:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:18:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ed0b26400, cur 1564680130 expire 1564679980 last 1564679903 Aug 01 10:23:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 10:23:12 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 10:25:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 10:25:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 10:27:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 10:27:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 01 10:29:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:29:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 10:29:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:30:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:31:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:33:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 10:33:17 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 10:33:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:35:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 10:35:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 10:37:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 10:37:14 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 10:38:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 10:43:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 10:43:20 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 10:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 10:46:18 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 10:47:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 10:47:26 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 10:54:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 10:54:22 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 01 10:57:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 10:57:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 10:57:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 10:57:39 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 11:01:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5eb6d875-9ed2-91ab-c491-ea832897c4e4 (at 10.9.114.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0b78aae400, cur 1564682484 expire 1564682334 last 1564682257 Aug 01 11:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 11:04:31 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 01 11:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fd69d824-c9ea-0f0c-6ecc-fa05990b5c16 (at 10.9.0.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518376800, cur 1564682752 expire 1564682602 last 1564682525 Aug 01 11:05:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 11:07:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 11:07:52 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 11:07:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 11:07:52 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 01 11:14:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 11:14:38 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 11:17:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 11:17:59 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 11:18:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 11:18:01 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 11:22:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:22:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 11:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 11:25:04 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 01 11:28:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 01 11:28:42 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 11:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 11:28:51 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 01 11:35:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 11:35:10 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 11:39:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 11:39:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 11:39:06 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 11:39:06 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 11:39:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9cb0b481-a543-cf79-4307-a21eb6ac928f (at 10.9.103.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1636e02800, cur 1564684796 expire 1564684646 last 1564684569 Aug 01 11:39:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 11:40:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9cb0b481-a543-cf79-4307-a21eb6ac928f (at 10.9.103.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ecd257c00, cur 1564684814 expire 1564684664 last 1564684587 Aug 01 11:40:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 11:45:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 11:45:24 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 11:49:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 11:49:18 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 11:50:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 11:50:08 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 11:51:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:52:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:53:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:53:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0be6d380-cae9-932b-545b-7ee72d9a934d (at 10.9.103.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148c9d7c00, cur 1564685689 expire 1564685539 last 1564685462 Aug 01 11:54:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 11:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0be6d380-cae9-932b-545b-7ee72d9a934d (at 10.9.103.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f452637c400, cur 1564685690 expire 1564685540 last 1564685463 Aug 01 11:55:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 11:55:38 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 01 11:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fba6feb3-1d06-9f10-9905-c04ad67c5c45 (at 10.9.115.13@o2ib4) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f3eebaa9400, cur 1564685765 expire 1564685615 last 1564685541 Aug 01 11:56:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 11:56:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e72157a9-7e55-8add-bf61-32d1953542b4 (at 10.9.115.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1478a37c00, cur 1564685768 expire 1564685618 last 1564685541 Aug 01 11:56:08 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 11:59:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 11:59:47 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 12:03:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 12:03:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 12:04:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 12:05:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 12:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:06:26 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 12:09:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 12:09:51 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 12:14:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 12:14:05 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 12:15:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 12:16:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:16:44 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 01 12:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 12:19:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 12:25:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 12:25:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 12:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:27:08 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 12:29:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 12:30:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 12:30:51 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 12:35:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 12:35:12 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 01 12:37:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:37:13 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 01 12:37:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 12:39:42 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 01 12:39:42 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 15 previous similar messages Aug 01 12:39:42 fir-md1-s1 kernel: Lustre: 97668:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564688375/real 1564688376] req@ffff8f2009f1e000 x1636753103348096/t0(0) o104->fir-MDT0000@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564688382 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 01 12:39:46 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 01 12:39:46 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Aug 01 12:39:50 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f170dc41500 x1631744208136384/t0(0) o101->dad5e408-d765-51d9-1659-bc9a52227289@10.9.103.30@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1564688395 ref 2 fl Interpret:/0/0 rc 0/0 Aug 01 12:39:55 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 01 12:39:55 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 01 12:39:56 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f170dc41500 x1631744208136384/t0(0) o101->dad5e408-d765-51d9-1659-bc9a52227289@10.9.103.30@o2ib4:25/0 lens 480/536 e 1 to 0 dl 1564688395 ref 1 fl Complete:/0/0 rc 0/0 Aug 01 12:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 12:41:16 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 01 12:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3c5e369-3c87-4127-5df4-27eec3aac44b (at 10.9.0.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f182bb98800, cur 1564688579 expire 1564688429 last 1564688352 Aug 01 12:42:59 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 01 12:45:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 12:45:14 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 01 12:47:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:47:16 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 01 12:51:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 12:51:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 01 12:57:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 12:57:38 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 01 13:00:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 13:00:34 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 01 13:01:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:01:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:01:40 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 13:07:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 13:07:48 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 13:10:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 13:10:59 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 13:11:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:11:49 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 13:18:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 13:18:05 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 13:20:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:21:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 13:21:01 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 01 13:21:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:21:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:21:57 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 01 13:28:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 13:28:06 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 01 13:28:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:31:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 13:31:11 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 01 13:32:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:32:09 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 13:32:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5cddcfd9-18d4-b1f6-af7b-face312a7868 (at 10.9.114.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f452439d400, cur 1564691578 expire 1564691428 last 1564691351 Aug 01 13:32:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 13:36:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:36:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:38:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 13:38:18 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 01 13:39:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:40:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:41:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 13:41:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 01 13:42:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:42:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 01 13:43:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:45:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:45:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:45:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 13:48:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 13:48:21 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 13:51:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 13:51:56 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 01 13:52:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 13:52:40 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 13:56:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 13:58:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1b33e753-8c34-550a-db14-b5d09e53cc8a (at 10.9.103.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef2b7c000, cur 1564693090 expire 1564692940 last 1564692863 Aug 01 13:58:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 13:58:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 13:58:23 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 14:03:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 14:03:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 14:03:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 14:03:52 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 14:08:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 14:08:42 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 01 14:10:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ae8500c00, cur 1564693855 expire 1564693705 last 1564693628 Aug 01 14:10:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 14:12:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 14:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 14:13:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 14:14:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 01 14:14:31 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 14:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 14:18:47 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 01 14:21:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 14:23:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 14:23:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 14:25:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 14:25:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 14:29:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 14:29:02 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 14:33:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 14:33:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 14:35:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 14:35:50 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 01 14:40:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b119f868-59c1-0af3-bebe-8e0d4b0dc664 (at 10.8.12.26@o2ib6) Aug 01 14:40:06 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 01 14:43:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 14:43:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 46173358-985f-18d8-80fe-b6809fa0d955 (at 10.9.101.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2469f6ec00, cur 1564695833 expire 1564695683 last 1564695606 Aug 01 14:44:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 14:44:06 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 01 14:47:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 14:47:28 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 14:47:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17cf149000, cur 1564696076 expire 1564695926 last 1564695849 Aug 01 14:47:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 01 14:50:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 14:50:23 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 14:52:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 14:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 14:54:23 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 14:57:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 14:57:36 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 01 14:58:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:00:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dc413fb8-c6ec-362c-323f-94963e5c6209 (at 10.8.0.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f8b6c000, cur 1564696816 expire 1564696666 last 1564696589 Aug 01 15:02:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 15:02:15 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 15:04:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:04:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 15:04:40 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 01 15:07:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 15:07:44 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 01 15:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 15:12:32 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 01 15:14:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 15:14:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 15:19:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 15:19:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 01 15:20:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:23:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cc75df0a-0970-bacb-f5fe-6f23d41f296b (at 10.9.103.18@o2ib4) Aug 01 15:23:23 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 15:25:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 15:25:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 15:29:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:30:25 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564698618/real 1564698618] req@ffff8f42ecc09800 x1636753205782288/t0(0) o104->fir-MDT0002@10.9.103.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564698625 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 01 15:30:25 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 01 15:30:32 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564698625/real 1564698625] req@ffff8f42ecc09800 x1636753205782288/t0(0) o104->fir-MDT0002@10.9.103.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564698632 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 01 15:30:33 fir-md1-s1 kernel: Lustre: 23582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f39a4c49200 x1631539430471488/t0(0) o101->a629e15d-d111-d210-8048-dc86df0d6d4e@10.9.105.37@o2ib4:8/0 lens 1776/3288 e 1 to 0 dl 1564698638 ref 2 fl Interpret:/0/0 rc 0/0 Aug 01 15:30:42 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2d0a194800 x1636437454601568/t0(0) o101->62c3a024-34de-fd61-6956-bb3675e9d145@10.8.1.13@o2ib6:17/0 lens 584/3264 e 1 to 0 dl 1564698647 ref 2 fl Interpret:/0/0 rc 0/0 Aug 01 15:30:46 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564698639/real 1564698639] req@ffff8f42ecc09800 x1636753205782288/t0(0) o104->fir-MDT0002@10.9.103.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564698646 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 01 15:30:46 fir-md1-s1 kernel: Lustre: 25677:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 01 15:30:48 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1be59b7500 x1638088619704400/t0(0) o101->4ab88333-a9b8-d133-e3a5-acde38b7aab7@10.8.18.30@o2ib6:23/0 lens 584/3264 e 1 to 0 dl 1564698653 ref 2 fl Interpret:/0/0 rc 0/0 Aug 01 15:30:48 fir-md1-s1 kernel: Lustre: 97641:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 01 15:30:53 fir-md1-s1 kernel: LustreError: 25677:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.103.11@o2ib4) failed to reply to blocking AST (req@ffff8f42ecc09800 x1636753205782288 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f075a030240/0x5d9ee6a50a472d57 lrc: 4/0,0 mode: PR/PR res: [0x2c002c3fe:0x1e800:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.103.11@o2ib4 remote: 0x866b4784d189070f expref: 27 pid: 23649 timeout: 3813736 lvb_type: 0 Aug 01 15:30:53 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.103.11@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 01 15:30:53 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.103.11@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f075a030240/0x5d9ee6a50a472d57 lrc: 3/0,0 mode: PR/PR res: [0x2c002c3fe:0x1e800:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.103.11@o2ib4 remote: 0x866b4784d189070f expref: 28 pid: 23649 timeout: 0 lvb_type: 0 Aug 01 15:30:54 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f1be59b7500 x1638088619704400/t0(0) o101->4ab88333-a9b8-d133-e3a5-acde38b7aab7@10.8.18.30@o2ib6:23/0 lens 584/536 e 1 to 0 dl 1564698653 ref 1 fl Complete:/0/0 rc 0/0 Aug 01 15:30:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 15:30:57 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 15:32:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3244848a-cf31-82aa-ac84-de5f844d0c7a (at 10.9.103.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d7837f400, cur 1564698775 expire 1564698625 last 1564698548 Aug 01 15:32:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 15:33:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 15:33:56 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 01 15:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 15:35:51 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 15:37:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:40:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 15:40:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 15:41:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 15:41:15 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 01 15:43:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 15:43:59 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 15:47:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 15:47:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 01 15:52:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 01 15:52:40 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 01 15:54:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 15:54:24 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 15:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b5280270-3b22-224e-0daa-bad5776be543 (at 10.9.103.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f451babf400, cur 1564700185 expire 1564700035 last 1564699958 Aug 01 15:56:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 15:57:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 15:57:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 15:59:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:01:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:03:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:03:41 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 16:04:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:04:38 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 01 16:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 16:07:25 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 16:13:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:13:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 16:14:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:14:47 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 01 16:17:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 16:17:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 16:18:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:24:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:24:27 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 16:24:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:24:56 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 01 16:25:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:26:19 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 01 16:26:19 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 01 16:28:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 16:28:04 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 16:28:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:29:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:35:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:35:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 01 16:35:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:35:04 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 16:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f35ce8e0-841e-c995-2dbb-2b088a3a1e16 (at 10.9.103.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d777b9000, cur 1564702539 expire 1564702389 last 1564702312 Aug 01 16:35:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 16:36:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:37:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:38:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 16:38:48 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 16:41:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:46:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:46:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 01 16:46:00 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 16:46:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e48fcc13-f8ba-c616-1240-b82b8312e495 (at 10.9.101.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21579a4400, cur 1564703200 expire 1564703050 last 1564702973 Aug 01 16:46:40 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 01 16:49:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 16:49:10 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 16:50:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:50:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 16:55:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 16:55:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 16:56:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 16:56:18 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 16:56:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 16:56:18 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 17:00:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c1c99a8-ad86-5bcb-6177-b95339b4441d (at 10.8.0.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28afddd800, cur 1564704020 expire 1564703870 last 1564703793 Aug 01 17:00:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 01 17:00:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 17:00:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 01 17:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f56f8cf5-b430-a4f9-f6f7-c485c7f965af (at 10.9.114.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0539f4cc00, cur 1564704184 expire 1564704034 last 1564703957 Aug 01 17:03:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 17:05:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:05:06 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Aug 01 17:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 17:06:26 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 01 17:06:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 01 17:06:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 17:10:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 17:10:24 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 01 17:15:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:15:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 17:16:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 17:16:51 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 17:16:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 17:16:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 17:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 17:20:34 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 17:26:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:26:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 01 17:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 17:26:51 fir-md1-s1 kernel: Lustre: Skipped 123 previous similar messages Aug 01 17:27:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 01 17:27:09 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 01 17:30:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 17:30:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 17:36:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:36:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 01 17:37:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 17:37:08 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 17:37:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 17:37:13 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 01 17:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 17:40:58 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 17:46:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:46:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 17:47:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 17:47:14 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Aug 01 17:49:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 17:49:28 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 01 17:51:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 17:51:16 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 17:57:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 17:57:25 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 01 17:59:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 17:59:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 01 17:59:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 17:59:46 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 18:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 18:01:31 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 18:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 18:07:28 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 01 18:10:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 18:10:17 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 01 18:11:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 18:11:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 18:14:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 18:14:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 18:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2cdac5f0-c6ed-53c8-1db0-a86d6a5d1c62 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4029e44400, cur 1564708516 expire 1564708366 last 1564708289 Aug 01 18:15:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 18:15:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2cdac5f0-c6ed-53c8-1db0-a86d6a5d1c62 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f249f632000, cur 1564708528 expire 1564708378 last 1564708301 Aug 01 18:15:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 01 18:17:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 18:17:42 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 01 18:20:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 18:20:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 18:21:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 18:21:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 01 18:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 18:24:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 18:27:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 18:27:48 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 01 18:30:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ab272aa-2e03-a9f0-e50d-253560ddae02 (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ac976a400, cur 1564709436 expire 1564709286 last 1564709209 Aug 01 18:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4ab272aa-2e03-a9f0-e50d-253560ddae02 (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26031ec000, cur 1564709437 expire 1564709287 last 1564709210 Aug 01 18:31:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 18:31:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 01 18:31:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 18:31:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 18:34:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 18:34:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 18:37:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 18:37:59 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 18:42:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 18:42:22 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 18:44:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 18:44:43 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 01 18:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 18:48:06 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 01 18:51:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 18:52:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 18:52:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 01 18:55:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 18:55:16 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 18:58:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 18:58:07 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 18:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 18:59:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 19:03:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:05:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 01 19:05:15 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 01 19:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 19:06:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 19:08:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 19:08:08 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 19:12:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:15:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 19:15:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 19:16:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 19:16:03 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 01 19:18:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 19:18:15 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 01 19:26:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 19:26:20 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 19:27:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 19:27:00 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 01 19:28:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 19:28:49 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 01 19:36:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 19:36:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 19:38:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 19:38:19 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 01 19:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 01 19:39:04 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 01 19:46:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 19:46:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 19:48:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 19:48:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 19:49:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:49:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 19:49:23 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 01 19:52:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:55:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 19:57:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 19:57:03 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 19:58:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 19:58:35 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 01 19:59:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 19:59:31 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 01 20:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 20:07:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 01 20:10:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 20:10:02 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 20:11:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 20:11:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:11:11 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 01 20:16:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:17:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 20:17:23 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 20:20:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 01 20:20:04 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 20:20:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:20:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 20:24:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 20:24:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:24:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 20:24:24 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 20:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 20:27:38 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 01 20:30:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 20:30:09 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 20:31:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:31:38 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 20:37:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 20:37:45 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 20:38:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 20:38:24 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 20:40:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 20:40:13 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 01 20:44:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:44:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 20:47:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 20:47:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 20:49:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 20:49:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 01 20:50:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 20:50:21 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 20:56:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 20:56:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 20:58:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 20:58:16 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 20:59:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 20:59:30 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 21:00:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 21:00:41 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 21:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 21:09:30 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 21:10:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 21:10:45 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 01 21:10:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 21:10:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 01 21:18:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 21:18:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 21:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 01 21:19:57 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 01 21:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 21:21:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 21:21:36 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 21:21:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 01 21:27:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 21:30:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 21:30:01 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 21:30:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 21:31:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 21:31:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 21:31:58 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 01 21:31:58 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 21:37:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 21:37:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 21:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 21:40:11 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 01 21:42:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 21:42:17 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 01 21:42:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 21:42:17 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 01 21:47:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 21:47:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 21:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 21:50:17 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 21:52:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 21:52:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 21:52:22 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 01 21:52:22 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 01 22:00:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 22:00:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 01 22:02:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 22:02:05 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 01 22:02:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 22:02:30 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 22:03:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:03:20 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 01 22:10:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 01 22:10:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 01 22:12:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 01 22:12:42 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 01 22:13:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:13:26 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 01 22:13:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 22:14:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3253725000, cur 1564722877 expire 1564722727 last 1564722650 Aug 01 22:14:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 22:20:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 22:20:56 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 01 22:22:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 22:22:46 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 01 22:26:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:26:44 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 01 22:29:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 22:31:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 22:31:57 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 22:33:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 22:33:10 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 22:37:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:37:02 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 01 22:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 22:41:58 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 22:43:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 22:43:12 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 01 22:47:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:47:07 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 01 22:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 22:51:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 01 22:52:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 22:52:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 01 22:53:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 22:53:18 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 01 22:54:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b895677e-a9f2-88aa-4abb-c0d3abe3dc28 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0ba375e400, cur 1564725275 expire 1564725125 last 1564725048 Aug 01 22:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 46c44be6-f537-43f6-ace2-51032c63f050 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c31e68400, cur 1564725277 expire 1564725127 last 1564725050 Aug 01 22:54:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 22:55:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 22:55:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 01 22:57:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 22:57:28 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 01 22:58:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 23:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cb1d3f800, cur 1564725668 expire 1564725518 last 1564725441 Aug 01 23:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 185 seconds. I think it's dead, and I am evicting it. exp ffff8f1826da7400, cur 1564725744 expire 1564725594 last 1564725559 Aug 01 23:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 01 23:02:24 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 01 23:03:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 23:03:31 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 01 23:06:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ed780966-4946-7c3e-b821-39dfa40ec387 (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2efa166400, cur 1564725990 expire 1564725840 last 1564725763 Aug 01 23:06:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 23:06:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 01 23:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ed780966-4946-7c3e-b821-39dfa40ec387 (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b1d7bb400, cur 1564726009 expire 1564725859 last 1564725782 Aug 01 23:06:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 01 23:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 23:08:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 23:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 23:13:05 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 01 23:13:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 23:13:39 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 01 23:19:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 23:19:33 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 01 23:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 23:23:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 01 23:23:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 23:23:40 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 01 23:27:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 23:31:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 23:31:01 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 01 23:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 23:33:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 01 23:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 23:33:53 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 01 23:44:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 01 23:44:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 01 23:44:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 01 23:44:21 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 01 23:45:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 01 23:45:11 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 01 23:54:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 01 23:54:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 01 23:54:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 01 23:54:38 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 01 23:55:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 01 23:55:32 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 01 23:59:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 01 23:59:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 00:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 00:04:50 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 02 00:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 00:04:50 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 02 00:06:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 00:06:39 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 02 00:11:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:12:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:14:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 00:14:56 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 00:14:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 00:14:56 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 00:17:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:18:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 00:18:06 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 00:18:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 00:24:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 00:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 00:24:56 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 02 00:29:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:29:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 00:29:38 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 00:35:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 00:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 00:35:01 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 02 00:35:01 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 00:41:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 00:41:45 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 02 00:42:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:45:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 00:45:01 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 00:45:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 00:45:26 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 00:45:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:47:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:52:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 00:52:07 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 02 00:52:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:53:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:55:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 00:55:02 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 02 00:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 00:55:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 00:58:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 00:58:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 01:04:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 01:04:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 02 01:05:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 01:05:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 01:05:06 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 01:05:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 01:05:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 01:10:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 01:10:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 01:14:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 01:14:28 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 02 01:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 01:15:16 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 02 01:17:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 01:17:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 01:25:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 01:25:16 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 02 01:25:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 01:25:41 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 02 01:26:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 01:26:56 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 02 01:27:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 01:27:13 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 02 01:35:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 01:35:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 01:36:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 01:36:30 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 01:37:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 01:37:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 02 01:38:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 01:38:04 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 01:46:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 01:46:59 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 01:47:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 01:47:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 01:48:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 01:48:13 fir-md1-s1 kernel: Lustre: Skipped 14532 previous similar messages Aug 02 01:51:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2ead7fc0-b593-2cc8-f5de-7a6048c76cc2 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f279135e000, cur 1564735865 expire 1564735715 last 1564735638 Aug 02 01:57:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 01:57:10 fir-md1-s1 kernel: Lustre: Skipped 14584 previous similar messages Aug 02 01:57:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 01:57:12 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 02 01:58:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 01:58:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 02 01:59:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17e3cc9800, cur 1564736397 expire 1564736247 last 1564736170 Aug 02 01:59:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 02:07:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 02:07:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 02:07:16 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 02 02:07:16 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 02 02:07:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 57819c23-a706-4cd3-75dd-5bacb51f6163 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f199d6d7c00, cur 1564736844 expire 1564736694 last 1564736617 Aug 02 02:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 02:08:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 02:09:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 02:17:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 02:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 02:17:22 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 02 02:17:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 02:17:27 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 02:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 02:18:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 02:27:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 02:27:23 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 02:28:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 02:28:39 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 02 02:29:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 02:29:01 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 02 02:32:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 02:32:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 02:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 02:37:46 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 02 02:39:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 02:39:36 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 02:41:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 02:41:07 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 02 02:45:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 02:48:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 02:48:12 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 02 02:50:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 02:50:00 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 02:51:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 02:51:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 02:56:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 02:58:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 02:58:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 02:59:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:00:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 03:00:54 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 02 03:04:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 03:04:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 03:05:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:07:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:07:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:08:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 03:08:27 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 02 03:11:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 03:11:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 03:16:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 03:16:08 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 02 03:18:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 03:18:28 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 02 03:19:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:21:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 03:21:44 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 03:22:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:22:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:24:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:26:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 03:26:31 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 03:28:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 03:28:39 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 02 03:30:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:32:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 03:32:26 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 03:33:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:34:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:37:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 03:37:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 03:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:39:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 03:39:01 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 02 03:40:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 03:42:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 03:45:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 03:45:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 03:48:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 03:48:59 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 02 03:49:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 03:49:04 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 02 03:53:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 03:53:09 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 03:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 03:59:07 fir-md1-s1 kernel: Lustre: Skipped 36148 previous similar messages Aug 02 04:01:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 04:01:04 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 02 04:01:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 04:01:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 04:03:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 04:03:44 fir-md1-s1 kernel: Lustre: Skipped 36082 previous similar messages Aug 02 04:09:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 04:09:33 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 02 04:12:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 04:12:11 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 02 04:13:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 04:13:44 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 04:15:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 04:15:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 04:19:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 04:19:39 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 02 04:22:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 04:22:58 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 02 04:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 04:23:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 04:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4dc7d753-4b99-5b32-d42c-fb863151f6cb (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d8029b800, cur 1564745172 expire 1564745022 last 1564744945 Aug 02 04:26:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 04:26:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4dc7d753-4b99-5b32-d42c-fb863151f6cb (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1779f2e000, cur 1564745173 expire 1564745023 last 1564744946 Aug 02 04:28:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 04:28:01 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 02 04:29:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 04:29:54 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 02 04:34:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 04:34:02 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 04:34:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 04:34:34 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 04:40:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 04:40:00 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 04:40:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 04:40:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 04:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bffeb03d-d072-aad0-6c67-57447500af12 (at 10.9.101.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2536abe800, cur 1564746263 expire 1564746113 last 1564746036 Aug 02 04:44:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 02 04:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bffeb03d-d072-aad0-6c67-57447500af12 (at 10.9.101.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45355fec00, cur 1564746264 expire 1564746114 last 1564746037 Aug 02 04:44:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 91ce94ec-5164-5ed3-bdde-081eb2c8122a (at 10.9.101.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14a0a7a400, cur 1564746266 expire 1564746116 last 1564746039 Aug 02 04:45:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 04:45:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 02 04:45:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 04:45:41 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 04:50:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 04:50:11 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 02 04:53:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 04:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 04:55:42 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 04:57:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 04:57:00 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 02 05:00:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 05:00:25 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 02 05:07:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 05:07:14 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 05:07:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 05:07:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 05:09:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 05:10:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 05:10:28 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 05:17:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 05:17:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 05:19:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 05:19:18 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 05:20:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 05:20:32 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 02 05:22:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 05:22:17 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 05:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 05:27:48 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 05:29:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 05:29:53 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 05:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 05:30:32 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 02 05:37:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 05:37:50 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 05:39:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 05:39:59 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 02 05:40:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 05:40:41 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 02 05:47:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 05:48:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 05:48:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 05:50:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 05:50:02 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 05:50:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 05:50:43 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 02 05:52:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 05:55:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 05:55:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 05:58:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 05:58:17 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 06:00:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 06:00:23 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 06:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 06:00:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 06:08:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 06:08:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 06:09:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 06:09:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 06:10:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 06:10:52 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 02 06:11:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 06:11:19 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 02 06:18:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 06:18:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 06:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 06:20:54 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 06:23:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 06:23:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 06:26:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 06:28:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 06:28:36 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 06:30:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 06:30:58 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 02 06:33:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 06:33:56 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 02 06:38:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 06:38:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 06:39:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 06:39:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 06:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 06:41:03 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 02 06:44:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 06:44:52 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 02 06:49:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 06:49:51 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 02 06:51:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 06:51:05 fir-md1-s1 kernel: Lustre: Skipped 130 previous similar messages Aug 02 06:51:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 06:51:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 06:55:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 06:55:13 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Aug 02 07:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 07:00:05 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 07:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 07:01:08 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 02 07:01:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1635b1b000, cur 1564754480 expire 1564754330 last 1564754253 Aug 02 07:03:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 07:03:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 07:05:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 07:05:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 07:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 07:10:56 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 07:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 07:11:15 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 02 07:16:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 07:16:35 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 02 07:21:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 07:21:03 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 07:21:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 07:21:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 07:21:26 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 02 07:21:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 02 07:28:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 07:28:28 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 07:31:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 07:31:36 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 02 07:31:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 07:31:41 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 07:36:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 07:36:30 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 07:38:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 07:38:35 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 07:41:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 07:41:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 07:41:53 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 07:41:53 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Aug 02 07:49:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 07:49:47 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 02 07:50:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 07:51:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 07:51:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 07:51:55 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 07:51:55 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 02 08:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 08:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 08:02:10 fir-md1-s1 kernel: Lustre: Skipped 54920 previous similar messages Aug 02 08:02:10 fir-md1-s1 kernel: Lustre: Skipped 54878 previous similar messages Aug 02 08:02:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 08:02:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 08:04:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 08:04:12 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 08:12:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 08:12:35 fir-md1-s1 kernel: Lustre: Skipped 136530 previous similar messages Aug 02 08:12:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 08:12:35 fir-md1-s1 kernel: Lustre: Skipped 136565 previous similar messages Aug 02 08:14:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 08:15:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 08:15:03 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 02 08:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cab5a879-d102-0d11-f7fe-0f2b0c00937b (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e09a73400, cur 1564758935 expire 1564758785 last 1564758708 Aug 02 08:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 08:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 08:22:39 fir-md1-s1 kernel: Lustre: Skipped 81292 previous similar messages Aug 02 08:22:39 fir-md1-s1 kernel: Lustre: Skipped 81269 previous similar messages Aug 02 08:25:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 08:25:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 02 08:29:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 08:29:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 08:32:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 08:32:40 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 02 08:32:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 08:32:40 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 08:36:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 08:36:22 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 02 08:40:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 08:40:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 08:42:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33ad5619-42f2-6fd3-6a50-2a5c216c13e4 (at 10.8.2.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efdcc400, cur 1564760564 expire 1564760414 last 1564760337 Aug 02 08:42:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 08:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 08:42:57 fir-md1-s1 kernel: Lustre: Skipped 8118 previous similar messages Aug 02 08:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 08:42:57 fir-md1-s1 kernel: Lustre: Skipped 8138 previous similar messages Aug 02 08:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 32392461-3220-2adb-e1e3-d29150c41512 (at 10.9.103.16@o2ib4) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f426cc12c00, cur 1564760640 expire 1564760490 last 1564760416 Aug 02 08:44:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 02 08:46:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 08:46:54 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 08:51:00 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761053/real 1564761053] req@ffff8f0b1e28dd00 x1636753659007520/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761060 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:51:00 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 08:51:21 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761074/real 1564761074] req@ffff8f3068e6f200 x1636753659137664/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761081 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:51:29 fir-md1-s1 kernel: Lustre: 10148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761082/real 1564761082] req@ffff8f2bb2bc0300 x1636753659205936/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761089 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:51:57 fir-md1-s1 kernel: Lustre: 23661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761110/real 1564761110] req@ffff8f0e5605f500 x1636753659342848/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761117 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:52:05 fir-md1-s1 kernel: Lustre: 25681:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b48e75400 x1638241941962832/t0(0) o101->27ef9320-178f-53ac-b738-4bc2f228a23d@10.9.0.63@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1564761130 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 08:52:18 fir-md1-s1 kernel: Lustre: 23661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761131/real 1564761131] req@ffff8f0e5605f500 x1636753659441584/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761138 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:52:18 fir-md1-s1 kernel: Lustre: 23661:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 02 08:52:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 08:52:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 08:53:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 08:53:09 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 08:53:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 08:53:09 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 08:53:12 fir-md1-s1 kernel: Lustre: 23633:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564761185/real 1564761185] req@ffff8f0e5605f800 x1636753659754736/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564761192 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 08:53:12 fir-md1-s1 kernel: Lustre: 23633:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 09:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 09:03:25 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 09:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 09:03:25 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 09:03:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 09:03:26 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 09:04:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 09:04:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 09:07:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0383908c-07f0-7abb-6876-5ff14053ed40 (at 10.8.0.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22c6792c00, cur 1564762079 expire 1564761929 last 1564761852 Aug 02 09:12:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ce60ed000, cur 1564762324 expire 1564762174 last 1564762097 Aug 02 09:12:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 09:13:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 09:13:35 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 09:13:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 09:13:35 fir-md1-s1 kernel: Lustre: Skipped 208363 previous similar messages Aug 02 09:13:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 09:13:36 fir-md1-s1 kernel: Lustre: Skipped 208320 previous similar messages Aug 02 09:16:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3cb329e1-2f08-7b0f-4fac-44bb74846b10 (at 10.8.0.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a447c2c00, cur 1564762583 expire 1564762433 last 1564762356 Aug 02 09:20:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 09:20:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 09:23:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 09:23:45 fir-md1-s1 kernel: Lustre: Skipped 12359 previous similar messages Aug 02 09:23:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 09:23:45 fir-md1-s1 kernel: Lustre: Skipped 12376 previous similar messages Aug 02 09:25:03 fir-md1-s1 kernel: LustreError: 46558:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2233801850 x1631353612053152/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:8/0 lens 488/448 e 0 to 0 dl 1564763108 ref 1 fl Interpret:/0/0 rc 0/0 Aug 02 09:25:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 02 09:25:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 09:25:17 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 02 09:27:36 fir-md1-s1 kernel: LustreError: 46553:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1d6abd2c50 x1631353612074464/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:5/0 lens 488/448 e 0 to 0 dl 1564763285 ref 1 fl Interpret:/0/0 rc 0/0 Aug 02 09:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 02 09:32:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 09:32:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 09:33:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 09:33:52 fir-md1-s1 kernel: Lustre: Skipped 14874 previous similar messages Aug 02 09:33:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 09:33:52 fir-md1-s1 kernel: Lustre: Skipped 14883 previous similar messages Aug 02 09:36:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 09:36:33 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 02 09:43:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 09:43:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 09:44:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 09:44:03 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 02 09:44:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 09:44:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 09:46:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 09:46:49 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 09:54:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 09:54:15 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 02 09:54:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 09:54:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 09:57:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 02 09:57:15 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 09:57:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 09:57:29 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 10:04:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 10:04:15 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Aug 02 10:05:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 10:05:02 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 10:08:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 10:08:39 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 02 10:14:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 10:14:29 fir-md1-s1 kernel: Lustre: Skipped 39767 previous similar messages Aug 02 10:15:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e4d8b0cf-743e-75bd-2d4d-a7525533f059 (at 10.8.12.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16d2332000, cur 1564766153 expire 1564766003 last 1564765926 Aug 02 10:15:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 10:16:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 10:16:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 10:16:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 10:16:16 fir-md1-s1 kernel: Lustre: Skipped 39736 previous similar messages Aug 02 10:17:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33f14af1-afd4-e885-e017-1a60bf66c38a (at 10.9.0.2@o2ib4) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8f4519774800, cur 1564766229 expire 1564766079 last 1564766026 Aug 02 10:17:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 10:17:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a3bd18d8-3db5-df1c-b07f-336571ebc30a (at 10.9.0.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2525f08000, cur 1564766253 expire 1564766103 last 1564766026 Aug 02 10:17:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 02 10:19:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 10:19:01 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 02 10:24:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 10:24:39 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 02 10:26:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 10:26:54 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 02 10:29:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 10:29:33 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 02 10:34:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 10:34:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 10:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a10c808b-85aa-b43d-fbe7-e885004efcf8 (at 10.9.0.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c7e6c8c00, cur 1564767346 expire 1564767196 last 1564767119 Aug 02 10:35:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 10:35:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 10:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 10:37:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 10:40:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 10:40:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 10:44:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 10:44:48 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 10:45:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 10:45:55 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 02 10:47:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 10:47:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 02 10:50:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 10:50:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 02 10:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 10:54:50 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 02 10:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 10:57:46 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 10:58:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 10:58:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 11:01:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:01:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 11:04:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 11:04:58 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 02 11:08:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e823ca92-6dff-c581-921f-f71adf2567a0 (at 10.8.27.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ef1713800, cur 1564769305 expire 1564769155 last 1564769078 Aug 02 11:08:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 11:08:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 11:08:26 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 11:09:24 fir-md1-s1 kernel: Lustre: 21312:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564769357/real 1564769357] req@ffff8f293f71ef00 x1636753696802832/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564769364 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:09:24 fir-md1-s1 kernel: Lustre: 21312:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 11:09:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 11:09:38 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564769371/real 1564769371] req@ffff8f3915211b00 x1636753696838960/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564769378 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:10:21 fir-md1-s1 kernel: Lustre: 23736:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564769414/real 1564769414] req@ffff8f0a71373900 x1636753696966224/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564769421 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:10:21 fir-md1-s1 kernel: Lustre: 23736:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 11:11:52 fir-md1-s1 kernel: Lustre: 22282:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564769505/real 1564769505] req@ffff8f2014f6bf00 x1636753697374896/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564769512 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:11:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:11:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 11:13:09 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564769582/real 1564769582] req@ffff8f060d32b600 x1636753697677648/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564769589 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:13:09 fir-md1-s1 kernel: Lustre: 23612:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 11:15:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3bd18d8-3db5-df1c-b07f-336571ebc30a (at 10.9.0.2@o2ib4) Aug 02 11:15:00 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 11:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 11:18:45 fir-md1-s1 kernel: Lustre: Skipped 21210 previous similar messages Aug 02 11:22:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:22:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 11:23:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 11:23:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 11:25:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 11:25:24 fir-md1-s1 kernel: Lustre: Skipped 21241 previous similar messages Aug 02 11:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 11:28:58 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 11:32:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 40b3c666-85bb-7cc6-dce2-ca98ff07da91 (at 10.9.109.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd935400, cur 1564770777 expire 1564770627 last 1564770550 Aug 02 11:32:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 11:33:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:33:36 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 11:34:51 fir-md1-s1 kernel: Lustre: 22282:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564770884/real 1564770884] req@ffff8f1c0781b900 x1636753706944448/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564770891 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:34:51 fir-md1-s1 kernel: Lustre: 22282:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 02 11:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 11:35:30 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 02 11:35:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 11:35:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 11:35:53 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564770946/real 1564770946] req@ffff8f107be4ef00 x1636753707411440/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564770953 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:35:53 fir-md1-s1 kernel: Lustre: 21410:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 02 11:36:28 fir-md1-s1 kernel: Lustre: 21413:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564770981/real 1564770981] req@ffff8f34bcaf4e00 x1636753707652512/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564770988 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:38:18 fir-md1-s1 kernel: Lustre: 97662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564771091/real 1564771091] req@ffff8f1e8a4e6f00 x1636753708322096/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564771098 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 11:38:18 fir-md1-s1 kernel: Lustre: 97662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 02 11:39:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 11:39:11 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 11:40:29 fir-md1-s1 kernel: Lustre: 21422:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564771222/real 1564771222] req@ffff8f3ea71a8600 x1636753709058992/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564771229 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 02 11:40:29 fir-md1-s1 kernel: Lustre: 21422:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 02 11:40:33 fir-md1-s1 kernel: Lustre: 10364:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3baa038000 x1638241967226352/t0(0) o101->27ef9320-178f-53ac-b738-4bc2f228a23d@10.9.0.63@o2ib4:8/0 lens 480/568 e 0 to 0 dl 1564771238 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 11:40:36 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f15bcc8e900 x1631353614303296/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:11/0 lens 480/568 e 0 to 0 dl 1564771241 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 11:41:01 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f38de568f00 x1638241967418384/t0(0) o101->27ef9320-178f-53ac-b738-4bc2f228a23d@10.9.0.63@o2ib4:6/0 lens 480/568 e 0 to 0 dl 1564771266 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 11:44:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:44:02 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 02 11:45:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 11:45:51 fir-md1-s1 kernel: Lustre: Skipped 1351 previous similar messages Aug 02 11:47:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 11:47:37 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 11:49:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 11:49:24 fir-md1-s1 kernel: Lustre: Skipped 1328 previous similar messages Aug 02 11:55:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 11:55:54 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 02 11:57:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 11:57:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 11:57:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 11:57:39 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 11:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 11:59:53 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 12:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:05:55 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 12:08:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 12:08:23 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 12:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 12:10:27 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 12:12:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 12:12:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 12:16:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:16:26 fir-md1-s1 kernel: Lustre: Skipped 70793 previous similar messages Aug 02 12:18:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 12:18:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 12:20:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 12:20:33 fir-md1-s1 kernel: Lustre: Skipped 70819 previous similar messages Aug 02 12:21:09 fir-md1-s1 kernel: LustreError: 21365:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1e14725850 x1631353615762000/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:27/0 lens 488/448 e 1 to 0 dl 1564773687 ref 1 fl Interpret:/0/0 rc 0/0 Aug 02 12:21:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 02 12:26:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:26:50 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Aug 02 12:29:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 12:29:29 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 02 12:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 12:30:34 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 12:36:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:36:55 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 02 12:39:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 02 12:39:54 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 02 12:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 12:40:44 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 12:44:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 12:44:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 12:47:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:47:09 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 02 12:48:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 12:51:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 12:51:20 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 02 12:51:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 12:52:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 02 12:52:25 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 02 12:57:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 12:57:19 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 02 12:59:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 12:59:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 13:02:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:02:29 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 13:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 13:03:04 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 13:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 13:07:28 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 13:09:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 13:10:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15b712cc00, cur 1564776642 expire 1564776492 last 1564776415 Aug 02 13:10:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 13:13:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:13:27 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 02 13:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 13:14:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 13:18:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 13:18:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 13:18:42 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564777115/real 1564777115] req@ffff8f0892cb9b00 x1636753734953008/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564777122 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 13:18:42 fir-md1-s1 kernel: Lustre: 27319:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Aug 02 13:23:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:23:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 13:24:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 13:24:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 13:24:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 13:24:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 13:28:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 13:28:11 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 13:33:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:33:53 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 13:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 13:34:53 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 02 13:35:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 13:35:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 13:38:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 13:38:14 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 02 13:44:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:44:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 13:45:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 13:45:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 13:45:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 13:45:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 13:48:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 13:48:20 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 02 13:55:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 13:55:37 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 13:55:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 13:55:39 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 02 13:56:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 13:56:43 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 02 13:58:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 13:58:27 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 02 14:06:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 14:06:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 14:06:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 14:06:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 02 14:08:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 14:08:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 02 14:14:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 14:16:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 14:16:35 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 14:17:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 14:17:09 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 14:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 14:18:47 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 14:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 14:26:41 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 14:27:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 14:27:17 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 02 14:28:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 14:28:53 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 02 14:36:40 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564781789/real 1564781789] req@ffff8f362a2d6c00 x1636753756520224/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564781800 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 14:36:44 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bed674e00 x1636364344267760/t0(0) o101->6660433e-6178-3b9d-5600-564c37c5d5bd@10.8.8.26@o2ib6:19/0 lens 576/3264 e 1 to 0 dl 1564781809 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 14:36:44 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages Aug 02 14:36:45 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2521796600 x1636456249800944/t0(0) o101->39551ddb-54d0-7699-e7af-d92b0f7ad265@10.9.108.38@o2ib4:20/0 lens 576/3264 e 1 to 0 dl 1564781810 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 14:36:45 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 117 previous similar messages Aug 02 14:36:46 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f251d2a3600 x1631623545910912/t0(0) o101->cd6a890f-3ae3-4002-9a6f-a0a5b59c9ffb@10.8.7.9@o2ib6:21/0 lens 576/3264 e 1 to 0 dl 1564781811 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 14:36:46 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 130 previous similar messages Aug 02 14:36:48 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19b1195a00 x1631562897252832/t0(0) o101->aba1b9f6-95e7-026a-a428-641863a8cbf1@10.9.103.3@o2ib4:23/0 lens 576/0 e 1 to 0 dl 1564781813 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 02 14:36:48 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 225 previous similar messages Aug 02 14:36:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 003dd27d-2bba-4ec3-7504-b78915027b95 (at 10.9.0.1@o2ib4) reconnecting Aug 02 14:36:50 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 02 14:36:51 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564781800/real 1564781800] req@ffff8f362a2d6c00 x1636753756520224/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564781811 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 02 14:36:51 fir-md1-s1 kernel: Lustre: 23576:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0970049200 x1631565010043312/t0(0) o101->47b964ff-80d0-9697-7d31-8a69bef7e672@10.9.104.47@o2ib4:20/0 lens 576/592 e 1 to 0 dl 1564781810 ref 1 fl Complete:/0/0 rc 0/0 Aug 02 14:36:51 fir-md1-s1 kernel: Lustre: 23576:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 58 previous similar messages Aug 02 14:37:45 fir-md1-s1 kernel: Lustre: 22005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564781858/real 1564781858] req@ffff8f22f7abe900 x1636753756680480/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564781865 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 02 14:38:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 02 14:38:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 14:38:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 14:38:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 14:38:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 14:38:57 fir-md1-s1 kernel: Lustre: Skipped 229 previous similar messages Aug 02 14:40:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 14:44:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 14:44:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 14:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 14:47:16 fir-md1-s1 kernel: Lustre: Skipped 183 previous similar messages Aug 02 14:48:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 14:48:46 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 02 14:48:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 14:48:58 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Aug 02 14:50:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 14:50:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 14:54:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 91854641-0e88-dbbb-3427-cc488e7ad499 (at 10.9.115.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f080aa6a000, cur 1564782883 expire 1564782733 last 1564782656 Aug 02 14:55:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 583662d3-f372-2a35-f933-993770a94606 (at 10.9.115.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148a58d000, cur 1564782900 expire 1564782750 last 1564782673 Aug 02 14:55:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 02 14:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 14:57:30 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 14:59:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 14:59:10 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 02 15:03:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 15:03:12 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 02 15:07:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 15:07:32 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 15:08:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 15:08:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 15:09:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 15:09:20 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 02 15:13:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 15:13:18 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 15:18:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 15:18:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 15:19:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 15:19:23 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 02 15:20:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 15:20:07 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 02 15:24:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 15:24:29 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 15:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 15:28:18 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 15:28:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25207f1000, cur 1564784925 expire 1564784775 last 1564784698 Aug 02 15:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 15:29:37 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 15:31:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 15:31:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 15:32:36 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f223dddf800 x1639095590693360/t0(0) o101->5693ead1-966a-5c7b-d780-e3a04a8813d1@10.8.10.4@o2ib6:11/0 lens 1776/3288 e 1 to 0 dl 1564785161 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 15:32:36 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 208 previous similar messages Aug 02 15:32:38 fir-md1-s1 kernel: Lustre: 97653:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a1d8de000 x1638282767335600/t0(0) o101->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:13/0 lens 592/3264 e 1 to 0 dl 1564785163 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 15:32:38 fir-md1-s1 kernel: Lustre: 97653:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Aug 02 15:35:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 15:35:34 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 15:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 15:38:20 fir-md1-s1 kernel: Lustre: Skipped 1137 previous similar messages Aug 02 15:39:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 15:39:44 fir-md1-s1 kernel: Lustre: Skipped 1175 previous similar messages Aug 02 15:41:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 15:41:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 15:45:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 15:45:35 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 02 15:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 15:48:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 15:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 15:50:03 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 15:54:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 15:54:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 15:56:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 15:56:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 15:58:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 15:58:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 16:00:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 16:00:37 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 16:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2115f3dc00, cur 1564786954 expire 1564786804 last 1564786727 Aug 02 16:07:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 02 16:07:54 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 16:08:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b4da96800, cur 1564787326 expire 1564787176 last 1564787099 Aug 02 16:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 16:08:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 16:09:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 16:09:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 16:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 16:11:19 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 16:18:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 16:18:32 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 16:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 16:19:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 16:22:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 16:22:02 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 02 16:28:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 16:28:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 16:28:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 16:28:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 16:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 16:29:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 16:31:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b8c167a2-32f8-f5e3-bc45-a224b920567f (at 10.9.103.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f12d7b22c00, cur 1564788666 expire 1564788516 last 1564788439 Aug 02 16:31:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b8c167a2-32f8-f5e3-bc45-a224b920567f (at 10.9.103.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d3b633c00, cur 1564788670 expire 1564788520 last 1564788443 Aug 02 16:31:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 02 16:32:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 03d184f9-5efa-7cb0-6b43-79308f115e4f (at 10.9.103.17@o2ib4) in 175 seconds. I think it's dead, and I am evicting it. exp ffff8f37ad64e400, cur 1564788742 expire 1564788592 last 1564788567 Aug 02 16:32:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 03d184f9-5efa-7cb0-6b43-79308f115e4f (at 10.9.103.17@o2ib4) in 179 seconds. I think it's dead, and I am evicting it. exp ffff8f289a894800, cur 1564788746 expire 1564788596 last 1564788567 Aug 02 16:32:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 16:32:33 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 02 16:33:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a07d54cc-57e0-dc91-4f95-17422e5e4d35 (at 10.9.103.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ebdd1bc00, cur 1564788794 expire 1564788644 last 1564788567 Aug 02 16:39:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 16:39:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 02 16:39:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 16:39:52 fir-md1-s1 kernel: Lustre: Skipped 12092 previous similar messages Aug 02 16:42:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 16:42:35 fir-md1-s1 kernel: Lustre: Skipped 12096 previous similar messages Aug 02 16:47:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 16:47:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 16:48:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9975124-36bb-1a51-a446-1a68380a4760 (at 10.9.108.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b3a7c00, cur 1564789735 expire 1564789585 last 1564789508 Aug 02 16:50:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 16:50:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 16:50:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 16:50:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 16:51:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3411ffac-482d-1535-c486-9206f14b07f9 (at 10.9.103.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3f32942400, cur 1564789902 expire 1564789752 last 1564789675 Aug 02 16:51:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 16:52:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 16:52:48 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 17:00:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 17:00:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 17:01:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 17:01:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 17:01:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 17:01:41 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 02 17:03:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 17:03:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 02 17:11:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 17:11:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 17:13:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 17:13:30 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 02 17:14:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 17:14:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 17:16:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 17:16:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 17:21:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 17:21:30 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 17:23:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 17:23:39 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 17:25:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d6fef0c00, cur 1564791901 expire 1564791751 last 1564791674 Aug 02 17:25:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 02 17:26:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 17:26:07 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 17:31:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 17:31:42 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 17:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 17:33:53 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 02 17:38:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 17:38:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 17:39:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 17:39:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 17:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 17:41:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 17:41:56 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 02 17:43:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 17:43:56 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 02 17:48:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 17:48:39 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 17:51:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 17:52:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 17:52:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 17:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 17:54:05 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 02 17:59:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 18:01:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:01:52 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 18:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 18:02:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 18:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 18:04:10 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 02 18:12:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 18:12:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 18:12:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:12:27 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 02 18:12:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 18:12:28 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 18:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 18:14:32 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 02 18:15:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c73e7f5b-a137-83ca-c193-48a039d02607 (at 10.9.101.39@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e80662c00, cur 1564794927 expire 1564794777 last 1564794700 Aug 02 18:22:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:22:33 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 18:22:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 18:22:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 18:22:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 18:22:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 18:25:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 18:25:00 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 02 18:32:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:32:36 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 02 18:33:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 18:33:28 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 18:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 18:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 18:36:01 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 02 18:37:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 18:37:20 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 18:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 18:43:44 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 18:44:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:44:24 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 02 18:46:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 18:46:02 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 02 18:54:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 18:54:07 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 18:55:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 18:55:05 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 02 18:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 18:56:07 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 02 18:56:59 fir-md1-s1 kernel: Lustre: 35232:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f129a285050 x1638896512647584/t0(0) o3->b71f7c83-f21b-f372-25d2-d5091bf74820@10.9.113.15@o2ib4:4/0 lens 488/440 e 1 to 0 dl 1564797424 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 19:04:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 19:04:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 19:05:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 19:05:07 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 02 19:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 19:06:16 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Aug 02 19:06:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:06:42 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 19:11:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:15:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 19:15:37 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 02 19:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 19:16:22 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 19:16:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 19:16:28 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 02 19:21:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:24:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28960e6000, cur 1564799088 expire 1564798938 last 1564798861 Aug 02 19:24:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 02 19:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 19:25:49 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 19:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 19:26:24 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Aug 02 19:27:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 19:27:47 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 02 19:35:00 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0dc91da050 x1631652858226240/t0(0) o3->c1bbe4f4-a78a-a916-da69-f738d5b89f92@10.9.114.7@o2ib4:5/0 lens 488/440 e 1 to 0 dl 1564799705 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 19:36:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 19:36:08 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 19:36:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:36:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 19:36:28 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 19:37:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:38:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 19:38:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 19:39:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:41:28 fir-md1-s1 kernel: Lustre: 18782:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f13b7f7f850 x1638896555946608/t0(0) o3->b71f7c83-f21b-f372-25d2-d5091bf74820@10.9.113.15@o2ib4:3/0 lens 488/440 e 1 to 0 dl 1564800093 ref 2 fl Interpret:/0/0 rc 0/0 Aug 02 19:46:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 19:46:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 19:46:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 19:46:28 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 02 19:49:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 02 19:49:56 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 19:52:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 19:56:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 19:56:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 19:56:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 19:56:59 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 02 20:00:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:00:01 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 20:02:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 20:02:44 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 20:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 20:06:16 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 20:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 20:07:19 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 20:11:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:11:08 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 02 20:13:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 20:13:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 20:16:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 20:16:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 02 20:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 20:17:28 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 02 20:22:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:22:14 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 02 20:24:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 20:24:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 20:26:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 20:26:44 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 02 20:28:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 20:28:31 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 02 20:34:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:34:12 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 02 20:34:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 20:34:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 02 20:36:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 20:36:51 fir-md1-s1 kernel: Lustre: Skipped 54319 previous similar messages Aug 02 20:38:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 20:38:42 fir-md1-s1 kernel: Lustre: Skipped 54370 previous similar messages Aug 02 20:45:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:45:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 20:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 20:46:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 20:49:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 20:49:01 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 02 20:51:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 20:55:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 20:55:15 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 02 20:56:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 20:56:54 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 02 20:59:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 02 20:59:36 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 02 21:02:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:02:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 21:06:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 21:06:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 02 21:07:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 21:07:12 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 02 21:09:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 21:09:51 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 02 21:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 02 21:17:16 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 21:17:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:17:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 21:17:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 21:17:59 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 02 21:19:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 21:19:59 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 02 21:26:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20ba152c00, cur 1564806378 expire 1564806228 last 1564806151 Aug 02 21:27:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 21:27:43 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 02 21:28:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 21:28:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:28:11 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 02 21:28:11 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 02 21:30:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 21:30:14 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 02 21:38:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 21:38:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 02 21:38:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 21:38:56 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 21:40:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 21:40:33 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 02 21:41:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2760885400, cur 1564807264 expire 1564807114 last 1564807037 Aug 02 21:48:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 21:48:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 21:49:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:49:01 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 21:50:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 21:50:29 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 02 21:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 21:50:44 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 21:51:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:51:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 21:58:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 21:58:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 21:58:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 02 22:00:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 22:00:51 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 02 22:05:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 22:05:04 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 02 22:07:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 22:09:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 22:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 22:11:12 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 02 22:15:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 22:15:15 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 02 22:19:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 22:19:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 02 22:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 22:21:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 22:27:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 22:27:01 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 22:27:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:27:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 02 22:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 22:29:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 22:30:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:31:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 22:31:48 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 02 22:34:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:34:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 22:37:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 22:37:40 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 02 22:39:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:39:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 22:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 22:39:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 02 22:41:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 22:41:51 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 02 22:47:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 02 22:47:53 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 02 22:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 22:50:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 02 22:50:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 22:50:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 22:51:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 22:51:53 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 02 22:57:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f67b2d800, cur 1564811860 expire 1564811710 last 1564811633 Aug 02 22:57:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 22:57:58 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 02 23:00:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 02 23:00:22 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 02 23:01:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 23:01:56 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 02 23:09:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 23:09:28 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 02 23:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 23:10:39 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 02 23:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 02 23:11:58 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 02 23:12:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 23:15:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 23:15:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 02 23:19:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 23:19:29 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 02 23:20:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 02 23:20:46 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 02 23:22:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 23:22:07 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 02 23:23:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 23:30:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 23:30:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 02 23:30:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 02 23:30:57 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 02 23:31:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 23:31:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 02 23:32:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 02 23:32:42 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 02 23:41:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 02 23:41:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 02 23:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 02 23:42:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 02 23:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 02 23:43:02 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 02 23:50:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 02 23:51:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 02 23:51:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 02 23:52:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 02 23:52:12 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 02 23:53:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 02 23:53:06 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 00:00:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:00:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 00:02:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 00:02:04 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 00:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 00:02:24 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 00:03:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 00:03:10 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 03 00:12:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 00:12:24 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 00:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 00:12:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 00:13:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 00:13:40 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 03 00:21:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:21:02 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 00:22:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 00:22:32 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 00:22:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 00:22:42 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 03 00:23:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 00:23:51 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 03 00:32:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 00:32:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 00:32:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 00:32:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 00:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 00:34:40 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 03 00:39:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:39:11 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 00:43:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 00:43:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 00:43:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 00:43:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 00:44:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:44:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 00:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 00:44:53 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 00:53:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 00:53:49 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 00:53:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 00:53:53 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 00:54:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 00:54:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 00:54:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 00:54:55 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 03 01:03:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 01:03:50 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 01:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 01:04:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 03 01:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 01:05:10 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 01:07:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 01:07:26 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 01:13:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 01:13:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 03 01:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 01:14:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 01:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 01:15:17 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 03 01:17:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 01:17:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 01:24:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 01:24:26 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 01:24:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 01:24:35 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 01:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 01:25:29 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 01:28:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 01:28:49 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 01:34:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 01:34:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 01:35:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 01:35:18 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 01:35:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 01:35:40 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 01:39:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 01:39:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 01:44:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 01:44:47 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 03 01:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 01:45:20 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 01:45:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 01:45:45 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 03 01:54:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 01:54:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 03 01:55:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 01:55:14 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 01:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 01:55:37 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 01:55:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 01:55:46 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 03 02:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 02:05:57 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 03 02:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 02:05:57 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 03 02:08:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 02:08:06 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 03 02:08:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 02:08:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 02:15:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 02:15:58 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 03 02:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 02:16:02 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 02:20:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 02:20:08 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 02:21:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 02:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 02:26:02 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 03 02:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 02:26:20 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 02:27:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2440cc7400, cur 1564824461 expire 1564824311 last 1564824234 Aug 03 02:30:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 02:30:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 02:36:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 02:36:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 02:36:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 02:36:05 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 03 02:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 02:36:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 02:40:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 02:40:22 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 03 02:46:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 02:46:11 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Aug 03 02:46:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 02:46:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 02:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 02:46:36 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 03 02:50:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 02:50:47 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 02:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 02:56:23 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 03 02:56:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 02:56:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 02:57:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 02:57:39 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 03:02:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 03:02:06 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 03:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 03:06:33 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 03 03:07:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 03:07:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 03:10:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 03:10:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 03:12:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 03:12:14 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 03 03:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 03:17:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 03:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 03:17:15 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 03 03:22:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 03:22:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 03:25:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 03:25:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 03:28:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 03:28:08 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 03 03:28:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 03:28:08 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 03 03:32:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 03:32:28 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 03 03:38:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 03:38:10 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 03 03:38:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 03:38:10 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 03:38:42 fir-md1-s1 kernel: Lustre: 35089:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1430f12050 x1640702107834832/t0(0) o3->b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9@10.9.114.15@o2ib4:17/0 lens 488/440 e 1 to 0 dl 1564828727 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 03:39:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 03:39:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 03:43:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 03:43:04 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 03 03:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 03:48:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 03:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 03:48:30 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 03:53:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 03:53:23 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 03:54:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 03:54:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 03:58:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 03:58:37 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 03 03:58:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 03:58:46 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 04:03:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 04:03:27 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 03 04:04:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 04:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 04:08:54 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 04:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 04:08:54 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 03 04:14:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 04:14:04 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 04:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 04:19:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 03 04:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 04:19:01 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 03 04:27:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 04:27:07 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 03 04:29:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 04:29:04 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 03 04:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 04:29:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 04:33:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32551cfc00, cur 1564832005 expire 1564831855 last 1564831778 Aug 03 04:38:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 04:38:12 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 03 04:39:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 04:39:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 04:39:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 04:39:14 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 04:39:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 04:39:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 04:44:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 04:44:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 04:49:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 04:49:11 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 04:49:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 04:49:30 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 04:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 04:49:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 04:54:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 04:59:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 04:59:22 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 04:59:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 04:59:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 04:59:46 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 04:59:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 04:59:46 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 03 05:05:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ed1046400, cur 1564833928 expire 1564833778 last 1564833701 Aug 03 05:09:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 05:09:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 05:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 05:09:48 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 05:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 05:09:48 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 03 05:13:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 05:13:22 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 05:19:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 05:19:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 03 05:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 05:19:52 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 05:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 05:19:52 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 05:20:24 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10831c5450 x1640778071224224/t0(0) o4->990e479b-e98b-9bd9-0468-eaed65b7d455@10.9.103.14@o2ib4:29/0 lens 2840/448 e 1 to 0 dl 1564834829 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 05:25:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 05:25:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 05:28:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 296d97ff-0de3-b3eb-25b6-28238cfb0a2e (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f280248a400, cur 1564835315 expire 1564835165 last 1564835088 Aug 03 05:29:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 05:29:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 05:29:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 05:29:54 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 03 05:31:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 05:31:35 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 05:31:38 fir-md1-s1 kernel: Lustre: 46571:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f10831c1050 x1638897655979392/t0(0) o3->b71f7c83-f21b-f372-25d2-d5091bf74820@10.9.113.15@o2ib4:13/0 lens 488/440 e 1 to 0 dl 1564835503 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 05:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 05:40:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 05:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 05:40:02 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 05:40:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 05:40:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 05:44:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 05:44:18 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 03 05:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 05:50:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 05:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 05:50:19 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 03 05:52:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 05:56:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 05:56:45 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 03 06:00:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 06:00:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 06:00:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 06:00:34 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 06:03:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 06:03:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 06:07:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 06:07:24 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 06:10:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 06:10:40 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 03 06:10:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 06:10:50 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 03 06:13:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 06:13:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 06:17:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 06:17:29 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 03 06:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 06:20:41 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 03 06:21:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 06:21:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 06:27:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 06:27:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 06:27:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 06:27:48 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 06:31:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 06:31:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 06:31:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 06:31:26 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 03 06:38:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 06:38:33 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 06:40:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 06:40:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 06:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 06:41:35 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 03 06:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 06:41:35 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 03 06:49:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 06:49:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 06:50:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 06:50:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 06:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 06:51:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 06:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 06:51:43 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 03 06:55:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 03 06:55:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 0, rc: 8 Aug 03 06:55:06 fir-md1-s1 kernel: LNetError: 21073:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Aug 03 06:55:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds Aug 03 06:55:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 10 previous similar messages Aug 03 06:55:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: accepting Aug 03 06:59:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 06:59:36 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 07:01:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 07:01:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 07:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 07:01:56 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 03 07:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 07:01:56 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 03 07:11:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 07:11:07 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 07:12:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 07:12:05 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 07:12:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 07:12:05 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 03 07:16:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 07:16:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 07:21:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 07:21:09 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 07:23:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 07:23:12 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 03 07:23:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 07:23:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 07:26:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 07:26:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 07:31:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 07:31:11 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 03 07:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 07:33:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 07:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 07:33:57 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 23717:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=1, delay=5691 Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 23717:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 6550:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f27b2c2cc50 x1639516236017184/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564843020 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 28233:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f4312f7bf00 x1637990187019296/t0(0) o103->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 28233:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564843014/real 1564843014] req@ffff8f2a3ffa9e00 x1636754186866128/t0(0) o104->fir-MDT0000@10.8.29.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564843021 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 03 07:37:01 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 1 seconds Aug 03 07:37:01 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 6 previous similar messages Aug 03 07:37:01 fir-md1-s1 kernel: LustreError: 21995:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after -1+1s req@ffff8f3986330850 x1640703333727840/t0(0) o3->4a1adff3-702e-e0b2-9a73-afa853da02e5@10.9.115.13@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564843020 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:01 fir-md1-s1 kernel: LustreError: 21995:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 19 previous similar messages Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 4a1adff3-702e-e0b2-9a73-afa853da02e5 (at 10.9.115.13@o2ib4), client will retry: rc -110 Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: 21995:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8f3986330850 x1640703333727840/t0(0) o3->4a1adff3-702e-e0b2-9a73-afa853da02e5@10.9.115.13@o2ib4:0/0 lens 488/440 e 0 to 0 dl 1564843020 ref 2 fl Complete:/0/ffffffff rc -110/-1 Aug 03 07:37:01 fir-md1-s1 kernel: LustreError: 22427:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f129a285050 x1639516236017056/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564843034 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:01 fir-md1-s1 kernel: LustreError: 22427:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 2 previous similar messages Aug 03 07:37:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 03 07:37:02 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1eb194e450 x1639516236017312/t0(0) o3->b9b7d443-6e99-c10b-4d68-3e3fa30c5530@10.9.113.5@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564843034 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 8a8b762b-3d50-8250-7301-05eab7cb4e19 (at 10.8.16.7@o2ib6), client will retry: rc -110 Aug 03 07:37:02 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 03 07:37:02 fir-md1-s1 kernel: LustreError: 21291:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Aug 03 07:37:03 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f19e06ce850 x1633740687405360/t0(0) o4->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1564843051 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:03 fir-md1-s1 kernel: LustreError: 21543:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 03 07:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6), client will retry: rc = -110 Aug 03 07:37:05 fir-md1-s1 kernel: LustreError: 21484:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f09597f4850 x1634128659254416/t0(0) o4->eef3e7bf-8b9f-8c5b-c710-00e4798713e4@10.9.104.71@o2ib4:27/0 lens 488/448 e 0 to 0 dl 1564843047 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with eef3e7bf-8b9f-8c5b-c710-00e4798713e4 (at 10.9.104.71@o2ib4), client will retry: rc = -110 Aug 03 07:37:05 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:5s); client may timeout. req@ffff8f3f11e8d850 x1631638332816816/t0(0) o4->339627b1-f298-e293-3cc1-dc6c48f43358@10.9.104.56@o2ib4:0/0 lens 488/448 e 0 to 0 dl 1564843020 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 03 07:37:05 fir-md1-s1 kernel: Lustre: 22226:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Aug 03 07:37:07 fir-md1-s1 kernel: LustreError: 27482:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f19e06ca850 x1631587056949824/t0(0) o3->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:1/0 lens 488/440 e 0 to 0 dl 1564843051 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1135836c-5fb6-92af-ade3-8ef6cf526018 (at 10.8.27.9@o2ib6), client will retry: rc -110 Aug 03 07:37:07 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 03 07:37:09 fir-md1-s1 kernel: Lustre: 46549:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23562ec450 x1635207496648160/t0(0) o3->577ac993-4ad9-0dce-4697-0326d1fd44f4@10.9.107.30@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564843034 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with b71f7c83-f21b-f372-25d2-d5091bf74820 (at 10.9.113.15@o2ib4), client will retry: rc -110 Aug 03 07:37:10 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 03 07:37:14 fir-md1-s1 kernel: LustreError: 46559:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f23562ec450 x1635207496648160/t0(0) o3->577ac993-4ad9-0dce-4697-0326d1fd44f4@10.9.107.30@o2ib4:14/0 lens 488/440 e 1 to 0 dl 1564843034 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:14 fir-md1-s1 kernel: LustreError: 46559:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Aug 03 07:37:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 577ac993-4ad9-0dce-4697-0326d1fd44f4 (at 10.9.107.30@o2ib4), client will retry: rc -110 Aug 03 07:37:19 fir-md1-s1 kernel: Lustre: 21716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1ce9ac6450 x1639151598899104/t0(0) o3->ad5b8b9d-f149-444a-fb05-2479a0cbbcd5@10.8.15.10@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1564843044 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:20 fir-md1-s1 kernel: Lustre: 21365:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f22c9eccc50 x1631576853750704/t0(0) o4->0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a@10.8.8.11@o2ib6:25/0 lens 504/448 e 0 to 0 dl 1564843045 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:20 fir-md1-s1 kernel: Lustre: 21365:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 03 07:37:22 fir-md1-s1 kernel: Lustre: 21716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1eb194b050 x1635613875735664/t0(0) o4->b2eb5b9e-3da9-54b2-edec-186a2e3f10e1@10.8.23.9@o2ib6:27/0 lens 504/448 e 0 to 0 dl 1564843047 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:22 fir-md1-s1 kernel: Lustre: 21716:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 03 07:37:24 fir-md1-s1 kernel: LustreError: 46529:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 23+0s req@ffff8f1f0c67f450 x1639244715138960/t0(0) o3->a820bb5a-e007-7544-04a5-afedbe00ee4e@10.9.112.16@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1564843044 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ad5b8b9d-f149-444a-fb05-2479a0cbbcd5 (at 10.8.15.10@o2ib6), client will retry: rc -110 Aug 03 07:37:24 fir-md1-s1 kernel: LustreError: 46529:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Aug 03 07:37:25 fir-md1-s1 kernel: LustreError: 46558:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1690f12850 x1639151598902080/t0(0) o3->ad5b8b9d-f149-444a-fb05-2479a0cbbcd5@10.8.15.10@o2ib6:1/0 lens 488/440 e 0 to 0 dl 1564843051 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 07:37:25 fir-md1-s1 kernel: LustreError: 46558:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages Aug 03 07:37:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a (at 10.8.8.11@o2ib6), client will retry: rc = -110 Aug 03 07:37:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a653829b-d51f-5632-8f04-92386793cbc4 (at 10.8.21.31@o2ib6), client will retry: rc = -110 Aug 03 07:37:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 03 07:37:26 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f17ce7e7050 x1631358339993376/t0(0) o4->a653829b-d51f-5632-8f04-92386793cbc4@10.8.21.31@o2ib6:25/0 lens 504/448 e 0 to 0 dl 1564843045 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 03 07:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with b2eb5b9e-3da9-54b2-edec-186a2e3f10e1 (at 10.8.23.9@o2ib6), client will retry: rc = -110 Aug 03 07:37:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f1eb194b050 x1635613875735664/t0(0) o4->b2eb5b9e-3da9-54b2-edec-186a2e3f10e1@10.8.23.9@o2ib6:27/0 lens 504/448 e 0 to 0 dl 1564843047 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 03 07:37:28 fir-md1-s1 kernel: Lustre: 44036:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 03 07:40:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 07:40:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 07:41:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 07:41:11 fir-md1-s1 kernel: Lustre: Skipped 249 previous similar messages Aug 03 07:42:54 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f215f8000 x1631616217285008/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:29/0 lens 480/568 e 0 to 0 dl 1564843379 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 07:42:54 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 03 07:42:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.29.1@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f31372d5e80/0x5d9ee6a9488ccf23 lrc: 3/0,0 mode: PW/PW res: [0x2c002c04a:0x41f2:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6dc06b8f288 expref: 15 pid: 23723 timeout: 3958438 lvb_type: 0 Aug 03 07:42:58 fir-md1-s1 kernel: LustreError: 21674:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f34fdfe9000 ns: mdt-fir-MDT0002_UUID lock: ffff8f324ab18240/0x5d9ee6a94a93204b lrc: 3/0,0 mode: PW/PW res: [0x2c002c04a:0x41f2:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.29.1@o2ib6 remote: 0x3ac5b6dc06ba342d expref: 11 pid: 21674 timeout: 0 lvb_type: 0 Aug 03 07:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 07:44:10 fir-md1-s1 kernel: Lustre: Skipped 552 previous similar messages Aug 03 07:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 07:44:10 fir-md1-s1 kernel: Lustre: Skipped 817 previous similar messages Aug 03 07:51:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 07:51:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 07:51:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 07:51:27 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 07:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 07:54:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 07:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 07:54:23 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 03 08:03:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 08:03:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 08:04:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 08:04:09 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 03 08:04:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 08:04:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 08:04:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 08:04:34 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 03 08:13:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 08:13:44 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 08:14:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 08:14:48 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 08:14:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 08:14:48 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 03 08:15:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 08:15:07 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 08:24:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 08:24:51 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 08:24:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 08:24:51 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 03 08:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 08:25:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 08:26:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 08:26:14 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 08:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 08:35:38 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 08:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 08:35:38 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 08:36:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 08:36:15 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 03 08:38:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 08:38:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 08:45:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 08:45:55 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 08:45:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 08:45:55 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 08:47:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 08:47:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 08:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 08:56:06 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 03 08:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 08:56:06 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 03 08:58:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 08:58:53 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 03 09:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 09:06:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 09:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 09:06:16 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 03 09:08:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 09:08:53 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 09:10:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:10:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 09:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:15:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:17:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 09:17:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 09:17:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 09:17:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 03 09:18:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 09:18:58 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 09:26:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 09:27:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 09:27:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 09:27:20 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 03 09:30:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 09:30:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 09:37:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 09:37:30 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 03 09:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 09:37:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 09:38:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:38:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 09:40:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 09:40:23 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 09:47:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 09:47:36 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 03 09:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 09:47:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 03 09:49:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 09:49:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 09:50:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 09:50:26 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 09:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 09:57:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 09:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 09:57:56 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 03 10:00:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 10:00:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 10:04:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 10:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 10:08:10 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 10:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 10:08:10 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 10:11:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 10:11:10 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 10:18:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 10:18:23 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 10:18:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 10:18:23 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 03 10:18:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 10:18:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 03 10:21:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 10:21:13 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 03 10:28:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 10:28:45 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 10:28:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 10:28:45 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 03 10:31:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 10:31:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 10:35:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 10:35:03 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 10:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 10:39:15 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 10:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 10:39:15 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 10:45:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 10:45:43 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 03 10:49:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 10:49:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 10:49:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 10:49:22 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 03 10:51:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 10:51:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 10:55:57 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a4acae050 x1638899108947904/t0(0) o3->b71f7c83-f21b-f372-25d2-d5091bf74820@10.9.113.15@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1564854962 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 10:57:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 10:57:26 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 10:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 10:59:29 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 10:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 10:59:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 11:01:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 11:01:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 03 11:07:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 11:07:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 11:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 11:09:30 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 11:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 11:09:30 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 03 11:15:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 11:15:15 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 11:18:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 11:18:04 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 03 11:20:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 11:20:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 11:20:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 11:20:01 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 03 11:26:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 11:26:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 11:28:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 11:28:10 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 11:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 11:30:06 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 11:30:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 11:30:21 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 11:37:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 11:37:00 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 11:40:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 11:40:11 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 11:40:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 11:40:11 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 11:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 11:42:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 11:48:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 11:48:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 03 11:50:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 11:50:27 fir-md1-s1 kernel: Lustre: Skipped 56605 previous similar messages Aug 03 11:51:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 11:51:40 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 11:53:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 11:53:09 fir-md1-s1 kernel: Lustre: Skipped 56595 previous similar messages Aug 03 11:55:54 fir-md1-s1 kernel: LustreError: 46587:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f184114d450 x1631353628746000/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:29/0 lens 488/448 e 0 to 0 dl 1564858559 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 11:55:54 fir-md1-s1 kernel: LustreError: 46587:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Aug 03 11:55:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 03 12:00:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 12:00:29 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 03 12:01:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 12:01:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 03 12:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 12:03:11 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 03 12:10:07 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 03 12:10:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 12:10:37 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 03 12:11:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 12:13:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 12:13:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 03 12:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 12:13:13 fir-md1-s1 kernel: Lustre: Skipped 64864 previous similar messages Aug 03 12:14:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 12:19:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 12:19:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 12:21:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 12:21:17 fir-md1-s1 kernel: Lustre: Skipped 64892 previous similar messages Aug 03 12:23:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 12:23:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 12:23:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 12:23:44 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 12:25:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 12:31:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 12:31:17 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 03 12:33:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 12:33:16 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 03 12:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 12:34:24 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 03 12:37:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 12:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 12:42:01 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 03 12:43:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 12:43:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 12:44:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 12:44:40 fir-md1-s1 kernel: Lustre: Skipped 19319 previous similar messages Aug 03 12:52:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 12:52:41 fir-md1-s1 kernel: Lustre: Skipped 19344 previous similar messages Aug 03 12:54:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 12:54:53 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 03 12:54:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 12:54:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 03 13:02:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 13:02:44 fir-md1-s1 kernel: Lustre: Skipped 2138 previous similar messages Aug 03 13:05:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:05:11 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 13:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 13:05:14 fir-md1-s1 kernel: Lustre: Skipped 2099 previous similar messages Aug 03 13:06:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 13:06:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 13:06:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:08:33 fir-md1-s1 kernel: Lustre: 23679:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564862906/real 1564862906] req@ffff8f34ee384e00 x1636754413168416/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564862913 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:08:40 fir-md1-s1 kernel: Lustre: 23603:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564862913/real 1564862913] req@ffff8f3273249b00 x1636754413370160/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564862920 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:09:01 fir-md1-s1 kernel: Lustre: 23645:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564862934/real 1564862934] req@ffff8f2ea3f0b600 x1636754414010928/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564862941 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:09:52 fir-md1-s1 kernel: Lustre: 26255:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564862985/real 1564862985] req@ffff8f18a2f34500 x1636754415580720/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564862992 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:11:09 fir-md1-s1 kernel: Lustre: 23671:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f318e663c00 x1639440994919408/t0(0) o36->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:14/0 lens 528/2888 e 1 to 0 dl 1564863074 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:11:10 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a80279e00 x1631611792045680/t0(0) o101->c0855e8e-4398-d036-706b-ca397c044b92@10.8.30.12@o2ib6:15/0 lens 576/3264 e 1 to 0 dl 1564863075 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:11:10 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 03 13:11:12 fir-md1-s1 kernel: Lustre: 23671:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ebf06c200 x1631565717936192/t0(0) o101->a534a15b-a672-0f0f-d166-77ac777fad65@10.8.18.35@o2ib6:17/0 lens 576/3264 e 1 to 0 dl 1564863077 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:11:12 fir-md1-s1 kernel: Lustre: 23671:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 03 13:11:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:11:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 13:11:56 fir-md1-s1 kernel: Lustre: 50445:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564863109/real 1564863109] req@ffff8f1c51517500 x1636754418302576/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564863116 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:12:47 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564863160/real 1564863160] req@ffff8f2138c92700 x1636754419134240/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564863167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 13:13:23 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 03 13:13:46 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2edaa79500 x1635092804754576/t0(0) o101->9d861c36-1c2e-ea43-579d-a9a1e7d701be@10.8.11.15@o2ib6:21/0 lens 576/3264 e 0 to 0 dl 1564863231 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:13:46 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 03 13:14:24 fir-md1-s1 kernel: LustreError: 21540:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1de6eb2850 x1631353630712224/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:23/0 lens 504/448 e 0 to 0 dl 1564863293 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 13:14:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 03 13:14:32 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564863265/real 1564863265] req@ffff8f2080933900 x1636754421886832/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564863272 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:15:18 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f198047c500 x1639441006222912/t0(0) o101->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:23/0 lens 480/568 e 0 to 0 dl 1564863323 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:15:18 fir-md1-s1 kernel: Lustre: 22004:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Aug 03 13:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 03 13:15:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 13:15:29 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f30b1973cc0/0x5d9ee6a9ea110e02 lrc: 3/0,0 mode: PW/PW res: [0x2c002c7b2:0xf:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2268f5e91 expref: 47 pid: 23728 timeout: 3978389 lvb_type: 0 Aug 03 13:15:44 fir-md1-s1 kernel: Lustre: 22004:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564863337/real 1564863337] req@ffff8f1ca0946000 x1636754423822000/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564863344 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 13:15:44 fir-md1-s1 kernel: Lustre: 22004:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 03 13:18:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 13:18:14 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 03 13:18:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:18:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 13:23:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 13:23:28 fir-md1-s1 kernel: Lustre: Skipped 23308 previous similar messages Aug 03 13:23:36 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1cb0953850 x1631353631162128/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:11/0 lens 488/448 e 0 to 0 dl 1564863821 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 13:23:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 03 13:25:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 13:25:45 fir-md1-s1 kernel: Lustre: Skipped 23300 previous similar messages Aug 03 13:28:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 13:28:15 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 13:33:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 13:33:29 fir-md1-s1 kernel: Lustre: Skipped 62021 previous similar messages Aug 03 13:36:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 13:36:02 fir-md1-s1 kernel: Lustre: Skipped 61978 previous similar messages Aug 03 13:37:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:37:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 13:39:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 13:39:38 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 03 13:43:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 13:43:46 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 13:46:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 13:46:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 03 13:47:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 13:47:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 13:51:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 13:51:35 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 13:53:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 13:53:47 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 13:54:02 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f6c1fa850 x1639301270225824/t0(0) o3->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:6/0 lens 488/440 e 1 to 0 dl 1564865646 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 13:56:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 13:56:42 fir-md1-s1 kernel: Lustre: Skipped 43229 previous similar messages Aug 03 14:03:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f37203e0c00, cur 1564866214 expire 1564866064 last 1564865987 Aug 03 14:03:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 03 14:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 14:04:32 fir-md1-s1 kernel: Lustre: Skipped 43231 previous similar messages Aug 03 14:04:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 14:04:58 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 03 14:06:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 14:06:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 14:07:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 14:15:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 14:15:04 fir-md1-s1 kernel: Lustre: Skipped 12205 previous similar messages Aug 03 14:15:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 14:15:57 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 03 14:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 14:16:56 fir-md1-s1 kernel: Lustre: Skipped 12190 previous similar messages Aug 03 14:17:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 14:17:52 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 14:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 14:25:04 fir-md1-s1 kernel: Lustre: Skipped 7059 previous similar messages Aug 03 14:27:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 14:27:06 fir-md1-s1 kernel: Lustre: Skipped 7033 previous similar messages Aug 03 14:27:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 14:27:13 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 14:29:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 14:29:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 14:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 14:35:46 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 03 14:37:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 14:37:10 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 14:37:36 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1edd775000, cur 1564868256 expire 1564868106 last 1564868029 Aug 03 14:39:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 14:39:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 03 14:45:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 14:45:47 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 03 14:46:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 14:46:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 14:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 14:47:19 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 03 14:49:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 14:49:48 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 14:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 14:56:00 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Aug 03 14:56:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 14:56:48 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 14:57:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 14:57:45 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 15:00:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:00:28 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 03 15:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f3483d400, cur 1564869935 expire 1564869785 last 1564869708 Aug 03 15:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 15:06:04 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 03 15:07:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 15:07:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 15:07:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 15:07:59 fir-md1-s1 kernel: Lustre: Skipped 50155 previous similar messages Aug 03 15:11:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:11:59 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 15:16:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 15:16:10 fir-md1-s1 kernel: Lustre: Skipped 50164 previous similar messages Aug 03 15:18:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 15:18:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 15:22:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:22:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 03 15:26:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 15:26:10 fir-md1-s1 kernel: Lustre: Skipped 72249 previous similar messages Aug 03 15:26:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 15:26:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 15:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 15:28:57 fir-md1-s1 kernel: Lustre: Skipped 72204 previous similar messages Aug 03 15:32:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:32:04 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 03 15:36:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 15:36:11 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 03 15:38:41 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f222c838c50 x1631353636137744/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:16/0 lens 488/448 e 1 to 0 dl 1564871926 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 15:38:46 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f222c838c50 x1631353636137744/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:16/0 lens 488/448 e 1 to 0 dl 1564871926 ref 1 fl Interpret:/0/0 rc 0/0 Aug 03 15:38:46 fir-md1-s1 kernel: LustreError: 46549:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Aug 03 15:38:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 03 15:39:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 15:39:22 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 15:39:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 15:39:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 15:42:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:42:31 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 15:44:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client de6c0881-2c5e-8ab2-d83e-992caf69004f (at 10.8.2.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501a6e800, cur 1564872242 expire 1564872092 last 1564872015 Aug 03 15:46:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 15:46:57 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 15:49:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 15:49:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 15:50:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 15:50:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 15:52:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 15:52:32 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 03 15:57:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 15:57:00 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 03 16:00:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 16:00:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 16:03:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:03:37 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 16:07:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 16:07:14 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 16:08:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 16:08:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 16:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 16:11:05 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 03 16:14:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:14:19 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 03 16:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 16:17:16 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 03 16:21:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 16:21:10 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 16:25:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:25:29 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 16:25:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 16:25:36 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Aug 03 16:27:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 16:27:16 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 16:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 16:31:13 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 16:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e479a6400, cur 1564875233 expire 1564875083 last 1564875006 Aug 03 16:33:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 03 16:35:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:35:35 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 16:36:58 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564875411/real 1564875411] req@ffff8f1d605dbf00 x1636754568881408/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564875418 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 16:37:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 16:37:38 fir-md1-s1 kernel: Lustre: Skipped 100760 previous similar messages Aug 03 16:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 16:41:22 fir-md1-s1 kernel: Lustre: Skipped 100743 previous similar messages Aug 03 16:44:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 16:44:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 16:44:38 fir-md1-s1 kernel: LustreError: 22285:0:(mdt_lvb.c:430:mdt_lvbo_fill()) fir-MDT0000: small buffer size 632 for EA 656 (max_mdsize 1256): rc = -34 Aug 03 16:47:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:47:11 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 16:47:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 16:47:42 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 03 16:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 16:51:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 03 16:57:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 16:57:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 16:57:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 16:57:47 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 03 16:59:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 16:59:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 17:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 17:01:56 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 03 17:04:10 fir-md1-s1 kernel: Lustre: 10586:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564877043/real 1564877043] req@ffff8f3ec0715100 x1636754579282384/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564877050 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 17:05:13 fir-md1-s1 kernel: Lustre: 23567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0ece371e00 x1638091394031712/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:18/0 lens 1872/3288 e 0 to 0 dl 1564877118 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:05:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1df4f31200/0x5d9ee6aa94f04496 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2268fdc59 expref: 90 pid: 21461 timeout: 3992178 lvb_type: 0 Aug 03 17:07:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 17:07:50 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 03 17:07:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 17:07:50 fir-md1-s1 kernel: Lustre: Skipped 61752 previous similar messages Aug 03 17:11:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 17:11:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 17:11:59 fir-md1-s1 kernel: Lustre: Skipped 61711 previous similar messages Aug 03 17:17:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 17:17:53 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 03 17:17:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 17:17:53 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 03 17:22:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 17:22:20 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 17:23:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f305501a800, cur 1564878227 expire 1564878077 last 1564878000 Aug 03 17:24:22 fir-md1-s1 kernel: Lustre: 97669:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564878255/real 1564878255] req@ffff8f1b34cfd100 x1636754585711760/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564878262 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 17:28:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 17:28:06 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 03 17:29:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 17:29:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 03 17:31:26 fir-md1-s1 kernel: Lustre: 23653:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f07233d8900 x1631568575333824/t0(0) o101->eafaef03-bf23-6214-eeef-c768f6a5fb7d@10.9.105.58@o2ib4:1/0 lens 584/3264 e 1 to 0 dl 1564878691 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:31:32 fir-md1-s1 kernel: Lustre: 23653:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f139e669500 x1638091394613616/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:7/0 lens 1784/3288 e 0 to 0 dl 1564878697 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:31:38 fir-md1-s1 kernel: Lustre: 10584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3fcbc36900 x1640795253277184/t0(0) o101->a1d65202-4b4c-07d9-f12d-b432157293c9@10.9.115.6@o2ib4:13/0 lens 584/3264 e 0 to 0 dl 1564878703 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:31:41 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f193cc65d00 x1640804977893664/t0(0) o101->054c050c-b1f2-4a76-25b0-a8bdcd9b4415@10.9.109.37@o2ib4:16/0 lens 584/3264 e 0 to 0 dl 1564878706 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:31:41 fir-md1-s1 kernel: Lustre: 97644:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 03 17:31:50 fir-md1-s1 kernel: Lustre: 20725:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f218a355d00 x1638054383298800/t0(0) o101->5af96943-1c02-797b-06d5-66725698e995@10.9.105.48@o2ib4:25/0 lens 584/3264 e 0 to 0 dl 1564878715 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:31:50 fir-md1-s1 kernel: Lustre: 20725:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 03 17:31:59 fir-md1-s1 kernel: Lustre: 23614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-6), not sending early reply req@ffff8f4502f41b00 x1634177111317328/t0(0) o101->27192dc1-dd13-e373-1c53-d70304c1bb94@10.9.109.59@o2ib4:3/0 lens 584/3264 e 0 to 0 dl 1564878723 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 17:32:04 fir-md1-s1 kernel: Lustre: 21675:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f4502f41b00 x1634177111317328/t0(0) o101->27192dc1-dd13-e373-1c53-d70304c1bb94@10.9.109.59@o2ib4:3/0 lens 584/536 e 0 to 0 dl 1564878723 ref 1 fl Complete:/0/0 rc 0/0 Aug 03 17:32:04 fir-md1-s1 kernel: Lustre: 21675:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 03 17:32:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 17:32:32 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 17:36:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 17:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 17:38:20 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 03 17:40:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 17:40:13 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 03 17:40:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 17:42:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 17:42:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 17:46:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 17:48:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 17:48:31 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 03 17:49:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b3c3ad400, cur 1564879798 expire 1564879648 last 1564879571 Aug 03 17:51:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 17:51:30 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 17:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 17:52:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 17:55:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 17:55:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 17:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 17:58:38 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 03 18:01:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:01:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 18:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 18:02:58 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 18:07:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 18:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 18:08:51 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 03 18:12:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:12:01 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 03 18:13:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 18:13:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 18:18:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 18:18:51 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 03 18:20:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 18:20:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 18:23:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 18:23:11 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 03 18:23:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:23:48 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 03 18:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 18:28:54 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 03 18:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 18:33:27 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 03 18:34:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:34:21 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 03 18:35:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 18:39:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 18:39:56 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 03 18:43:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 18:43:27 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 18:44:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:44:26 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 03 18:50:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 18:50:02 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 03 18:51:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 18:51:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 18:53:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 18:53:36 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 18:54:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 18:54:44 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 03 19:00:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 19:00:14 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 03 19:04:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 19:04:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 19:04:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 19:04:46 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 03 19:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 19:10:56 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 19:12:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:12:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 19:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 19:14:33 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 19:15:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 19:15:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 03 19:18:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:21:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 19:21:09 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 03 19:21:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:21:44 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 19:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 19:24:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 19:26:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 03 19:26:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 03 19:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf5acf74-b620-4982-7052-0dc275e44804 (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cbd7b8800, cur 1564885660 expire 1564885510 last 1564885433 Aug 03 19:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 19:31:09 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 03 19:35:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 19:35:08 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 19:38:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 19:38:17 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 19:40:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:40:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 19:41:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 19:41:11 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 03 19:42:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:45:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 19:45:29 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 19:46:03 fir-md1-s1 kernel: Lustre: 26255:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564886756/real 1564886756] req@ffff8f1b8efe5100 x1636754630707600/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564886763 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 19:49:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 19:49:41 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 19:51:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 19:51:11 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 03 19:53:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 19:55:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 19:55:38 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 19:59:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 19:59:47 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 03 20:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 20:01:42 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 03 20:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 20:06:42 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 20:09:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 20:09:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 03 20:11:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 20:11:47 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 03 20:14:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:15:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:15:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 20:16:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 20:16:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 20:20:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:21:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 20:21:47 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 03 20:21:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 20:21:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 20:21:56 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 03 20:22:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:27:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 20:27:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 20:31:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 20:31:49 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 20:31:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 20:31:58 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 03 20:35:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:36:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:37:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 20:37:58 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 20:38:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:42:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 20:42:19 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 03 20:44:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 20:44:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:44:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 20:48:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 20:48:20 fir-md1-s1 kernel: Lustre: Skipped 46544 previous similar messages Aug 03 20:49:11 fir-md1-s1 kernel: Lustre: 23723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564890544/real 1564890544] req@ffff8f2e44322400 x1636754652282928/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564890551 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 20:49:18 fir-md1-s1 kernel: Lustre: 23723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564890551/real 1564890551] req@ffff8f2e44322400 x1636754652282928/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564890558 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 20:52:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 20:52:43 fir-md1-s1 kernel: Lustre: Skipped 46560 previous similar messages Aug 03 20:54:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 20:54:12 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 03 20:58:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 20:58:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 20:58:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 20:58:49 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 21:00:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:02:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 21:02:57 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 03 21:04:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:04:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 21:04:37 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 03 21:08:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:08:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 21:08:52 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 21:12:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 21:12:57 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 03 21:15:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 21:15:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:15:33 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 03 21:19:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 21:19:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 21:22:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 21:22:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 03 21:24:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f44a2357800, cur 1564892641 expire 1564892491 last 1564892414 Aug 03 21:24:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 03 21:27:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 21:27:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 21:28:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:29:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 21:29:49 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 03 21:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 21:33:14 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 03 21:38:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 21:38:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 03 21:39:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 21:39:55 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 21:43:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:43:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 21:43:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 21:43:25 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 03 21:49:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 21:49:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 21:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 21:50:03 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 21:53:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 21:53:26 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 03 21:55:57 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894550/real 1564894550] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894557 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 03 21:56:04 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894557/real 1564894557] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894564 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:56:05 fir-md1-s1 kernel: Lustre: 23614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f40a2f16000 x1631560625055904/t0(0) o101->ef523b6a-10a9-bdd6-3e05-0d5f1df0af3f@10.9.108.14@o2ib4:10/0 lens 1792/3288 e 1 to 0 dl 1564894570 ref 2 fl Interpret:/0/0 rc 0/0 Aug 03 21:56:05 fir-md1-s1 kernel: Lustre: 23614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 03 21:56:11 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894564/real 1564894564] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894571 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:56:18 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894571/real 1564894571] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894578 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:56:25 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894578/real 1564894578] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894585 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:56:39 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894592/real 1564894592] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894599 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:56:39 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 03 21:56:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 21:56:48 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 21:57:00 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894613/real 1564894613] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894620 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:57:00 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 03 21:57:35 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564894648/real 1564894648] req@ffff8f3fc8951800 x1636754671552352/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1564894655 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 03 21:57:35 fir-md1-s1 kernel: Lustre: 23713:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 03 21:58:24 fir-md1-s1 kernel: LustreError: 23713:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.109.37@o2ib4) failed to reply to blocking AST (req@ffff8f3fc8951800 x1636754671552352 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1f6909b180/0x5d9ee6ab2de0584b lrc: 4/0,0 mode: PR/PR res: [0x200029768:0x669:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0x5d1479f42751df25 expref: 16 pid: 97639 timeout: 4009906 lvb_type: 0 Aug 03 21:58:24 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.109.37@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 03 21:58:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.109.37@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1f6909b180/0x5d9ee6ab2de0584b lrc: 3/0,0 mode: PR/PR res: [0x200029768:0x669:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0x5d1479f42751df25 expref: 17 pid: 97639 timeout: 0 lvb_type: 0 Aug 03 21:58:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 054c050c-b1f2-4a76-25b0-a8bdcd9b4415 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36c61a0000, cur 1564894739 expire 1564894589 last 1564894512 Aug 03 22:00:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 22:00:30 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 03 22:00:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 22:00:38 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 03 22:03:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 22:03:28 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 03 22:10:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 22:10:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 03 22:10:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 22:10:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 22:12:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 22:12:33 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 03 22:13:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 22:13:31 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 22:20:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 22:20:50 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 22:22:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 22:22:36 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 22:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 22:24:14 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 03 22:28:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 22:28:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 22:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 03 22:31:09 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 03 22:32:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 03 22:32:49 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 22:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 03 22:34:15 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 03 22:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 22:41:25 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 03 22:43:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 22:43:23 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 03 22:44:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 22:44:17 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 03 22:51:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 22:51:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 22:52:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 22:52:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 22:54:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 03 22:54:03 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 03 22:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 22:54:30 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 22:54:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 22:59:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 23:01:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 03 23:01:35 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 03 23:04:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 23:04:41 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 03 23:05:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 03 23:05:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 03 23:08:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 23:08:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 03 23:11:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 03 23:11:50 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 03 23:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 23:14:46 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 03 23:16:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 23:16:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 03 23:22:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 23:22:02 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 03 23:23:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 23:23:09 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 03 23:24:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 03 23:24:50 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 03 23:26:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 23:26:09 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 03 23:32:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 23:32:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 03 23:34:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 03 23:34:51 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Aug 03 23:35:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 23:35:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 03 23:36:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 23:36:22 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 03 23:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 03 23:42:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 03 23:45:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 03 23:45:00 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 03 23:46:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 03 23:46:43 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 03 23:47:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 23:47:14 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 03 23:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 03 23:52:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 03 23:55:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 03 23:55:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 03 23:58:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 03 23:58:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 04 00:02:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 00:02:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 00:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1d4d66b7-4090-a2a1-c8a3-bca6f6637eca (at 10.9.106.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251fd19000, cur 1564902232 expire 1564902082 last 1564902005 Aug 04 00:03:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 04 00:05:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 00:05:40 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 04 00:07:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:08:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 00:08:27 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 00:09:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:13:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 00:13:02 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 00:13:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:15:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 00:15:46 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 04 00:17:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e53089e0-0379-2982-632f-afbd57f75e4f (at 10.8.2.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bc52be400, cur 1564903026 expire 1564902876 last 1564902799 Aug 04 00:17:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 04 00:20:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 00:20:49 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 00:23:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 00:23:04 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 00:24:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:24:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 00:26:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 00:26:09 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 04 00:30:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 00:30:51 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 00:33:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 00:33:17 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 00:35:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:35:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 00:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 00:36:20 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 04 00:41:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 00:41:17 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 04 00:43:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 00:43:23 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 04 00:46:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 00:46:28 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 04 00:48:50 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564904923/real 1564904923] req@ffff8f1e201a2100 x1636754731204544/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564904930 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 00:48:50 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 04 00:51:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 00:51:22 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 04 00:52:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 00:52:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 00:53:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 00:53:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 00:56:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 00:56:36 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 04 01:01:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 01:01:47 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 01:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 01:04:31 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 01:06:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 01:06:47 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 01:12:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 01:12:58 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 01:13:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 01:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 01:14:32 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 01:16:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 01:16:52 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 04 01:18:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 01:24:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 01:24:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 01:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 01:25:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 01:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 01:27:08 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 04 01:27:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 01:27:50 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 04 01:31:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 01:31:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 01:35:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 01:35:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 01:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 01:37:13 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 01:41:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 01:41:10 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 01:45:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 01:45:33 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 01:47:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 01:47:33 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 01:48:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 01:48:00 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 01:48:37 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3484c12100 x1631771885735264/t0(0) o36->236cf0e9-1d9b-0604-f09b-9a800534708f@10.8.24.29@o2ib6:12/0 lens 528/2888 e 1 to 0 dl 1564908522 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 01:48:37 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 04 01:48:51 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f20fad20d80/0x5d9ee6ab9871f390 lrc: 3/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226907c8e expref: 40 pid: 20462 timeout: 4023591 lvb_type: 0 Aug 04 01:52:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 04 01:52:59 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 04 01:57:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 01:57:40 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 01:58:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 01:58:30 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 02:00:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 02:00:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 02:03:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 04 02:03:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 04 02:05:24 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f287d8f1400, cur 1564909524 expire 1564909374 last 1564909297 Aug 04 02:05:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 04 02:07:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 02:07:50 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 04 02:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 02:08:48 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 02:13:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 02:13:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 04 02:13:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 02:18:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 02:18:41 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 02:19:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 02:19:08 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 02:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee31fc000, cur 1564910366 expire 1564910216 last 1564910139 Aug 04 02:23:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 02:23:23 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 04 02:29:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 02:29:00 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 02:29:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 02:29:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 04 02:33:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 02:33:49 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 02:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 02:39:09 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 04 02:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 02:40:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 02:44:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 02:44:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 02:49:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 02:49:16 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 04 02:49:58 fir-md1-s1 kernel: Lustre: 21127:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564912191/real 1564912191] req@ffff8f2675d69e00 x1636754766396784/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564912198 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 02:49:58 fir-md1-s1 kernel: Lustre: 21127:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 04 02:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 02:50:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 04 02:54:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 02:54:12 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 02:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 02:59:20 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 04 03:00:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 03:00:57 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 03:03:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:04:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 03:04:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 03:07:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 03:09:50 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 04 03:11:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 03:11:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 03:13:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ff6079c00, cur 1564913591 expire 1564913441 last 1564913364 Aug 04 03:15:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 04 03:15:05 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 03:20:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 03:20:01 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 03:21:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 03:21:15 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 04 03:24:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c0dcb7800, cur 1564914263 expire 1564914113 last 1564914036 Aug 04 03:24:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:25:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:25:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 03:25:52 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 03:26:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:30:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 03:30:02 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 04 03:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 03:31:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 03:35:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:37:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 03:37:10 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 04 03:40:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 03:40:07 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 04 03:41:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 03:41:40 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 03:43:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:43:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 03:48:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 03:48:51 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 03:49:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 03:49:17 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 03:50:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 03:50:08 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 04 03:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 03:51:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 04:00:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 04:00:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 04:02:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 04:02:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 04:02:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:02:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 04:06:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 04:06:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 04 04:11:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 04:11:03 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 04 04:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 04:12:32 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 04:12:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:12:39 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 04:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b684f143-828e-ce3c-dd9e-7161a1b78891 (at 10.9.101.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34feab7400, cur 1564917435 expire 1564917285 last 1564917208 Aug 04 04:17:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 04:17:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 04:21:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 04:21:14 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 04 04:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 04:22:34 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 04:23:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:23:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 04:31:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 04:31:34 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 04 04:32:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 04:32:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 04:35:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 04:35:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 04:35:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:35:47 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 04:42:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 04:42:11 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 04 04:43:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 04:43:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 04:46:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:46:26 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 04 04:47:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 04:47:09 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 04:50:36 fir-md1-s1 kernel: Lustre: 31015:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564919429/real 1564919429] req@ffff8f295d7ada00 x1636754798682144/t0(0) o105->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1564919436 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 04:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 04:52:34 fir-md1-s1 kernel: Lustre: Skipped 397 previous similar messages Aug 04 04:53:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 04:53:34 fir-md1-s1 kernel: Lustre: Skipped 359 previous similar messages Aug 04 04:57:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 04:57:06 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 04:58:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 04:58:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 05:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 05:02:43 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Aug 04 05:03:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 05:03:36 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 04 05:07:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 05:07:46 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 05:09:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 05:09:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 05:12:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 05:12:57 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 04 05:13:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 05:13:44 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 05:18:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 05:18:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 05:21:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 05:21:28 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 05:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 05:23:09 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 04 05:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 05:24:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 05:32:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 05:32:55 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 04 05:33:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 05:33:09 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 05:33:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 05:33:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 05:34:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 05:34:22 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 05:42:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 05:42:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 04 05:43:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 05:43:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 05:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 05:44:24 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 05:53:05 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f318b216600 x1631745178882544/t0(0) o101->9d5860bb-892b-bf7a-0b1c-c6536c5f1647@10.8.23.35@o2ib6:10/0 lens 576/3264 e 1 to 0 dl 1564923190 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 05:53:05 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 04 05:53:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 05:53:07 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 05:53:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 05:53:10 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 05:53:15 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f30b2645400 x1631684181155536/t0(0) o36->bfaf32fd-a75c-1493-838b-c2682e1a6ae6@10.9.101.15@o2ib4:20/0 lens 576/2888 e 1 to 0 dl 1564923200 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 05:53:15 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 04 05:53:19 fir-md1-s1 kernel: Lustre: 23634:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f111586e000 x1631684181162240/t0(0) o36->bfaf32fd-a75c-1493-838b-c2682e1a6ae6@10.9.101.15@o2ib4:24/0 lens 576/2888 e 1 to 0 dl 1564923204 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 05:53:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2615a66300/0x5d9ee6abff2ed4ad lrc: 3/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22690bbb1 expref: 43 pid: 22285 timeout: 4038259 lvb_type: 0 Aug 04 05:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 05:56:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 05:56:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 05:56:06 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 05:59:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 05:59:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 06:00:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 927ebcad-3373-a003-8433-ef313bb0111b (at 10.8.15.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3312bc8c00, cur 1564923654 expire 1564923504 last 1564923427 Aug 04 06:00:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 04 06:03:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:03:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 06:03:41 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 04 06:05:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 06:05:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 06:06:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 06:06:06 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 06:09:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:09:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 06:13:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 06:13:41 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 04 06:15:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 06:15:38 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 06:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 06:16:54 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 06:21:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:21:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 06:23:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 06:23:52 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 04 06:27:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 06:27:09 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 06:28:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 06:28:57 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 06:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 06:34:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 06:38:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 06:38:00 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 04 06:39:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 06:39:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 06:43:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:43:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 06:44:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 06:44:20 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 04 06:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e40e39400, cur 1564926457 expire 1564926307 last 1564926230 Aug 04 06:47:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 04 06:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 06:48:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 06:48:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:49:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 06:49:26 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 06:51:19 fir-md1-s1 kernel: Lustre: 21676:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ea5030f00 x1631718787763216/t0(0) o101->5bf06290-4060-c1df-d0e7-c92915b19d41@10.8.26.6@o2ib6:24/0 lens 576/3264 e 1 to 0 dl 1564926684 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 06:51:19 fir-md1-s1 kernel: Lustre: 21676:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 04 06:51:33 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3985efc500 x1634134925547184/t0(0) o36->bf244a56-6162-688a-5a5d-b94ea7dbce3e@10.9.108.3@o2ib4:8/0 lens 584/2888 e 0 to 0 dl 1564926698 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 06:51:33 fir-md1-s1 kernel: Lustre: 21145:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 04 06:51:55 fir-md1-s1 kernel: Lustre: 23587:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f4044885400 x1633939745459328/t0(0) o36->45ea93d0-601e-7c75-247d-e7b91b654603@10.9.101.53@o2ib4:0/0 lens 744/2888 e 0 to 0 dl 1564926720 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 06:53:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 06:53:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 06:54:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 06:54:55 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 04 06:59:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 06:59:05 fir-md1-s1 kernel: Lustre: Skipped 78685 previous similar messages Aug 04 07:00:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 07:01:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:01:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 07:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 07:05:29 fir-md1-s1 kernel: Lustre: Skipped 78683 previous similar messages Aug 04 07:09:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 07:09:09 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 07:11:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 07:11:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 07:11:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:11:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 07:15:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 07:15:51 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 04 07:19:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 07:19:30 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 04 07:22:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:22:34 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 07:26:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 07:26:07 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 07:27:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 07:27:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 07:29:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 07:29:34 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 07:32:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:32:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 07:36:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 07:36:32 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 04 07:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 07:40:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 07:43:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:43:56 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 07:44:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 07:44:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 07:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 07:46:35 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 04 07:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 07:50:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 07:54:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 07:54:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 04 07:55:31 fir-md1-s1 kernel: Lustre: 21434:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f159bbead00 x1631771889946912/t0(0) o36->236cf0e9-1d9b-0604-f09b-9a800534708f@10.8.24.29@o2ib6:6/0 lens 528/2888 e 1 to 0 dl 1564930536 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 07:55:31 fir-md1-s1 kernel: Lustre: 21434:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 04 07:55:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f233b3e8d80/0x5d9ee6ac3133b39a lrc: 3/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22690d7e9 expref: 42 pid: 20728 timeout: 4045605 lvb_type: 0 Aug 04 07:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 07:56:37 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 04 07:57:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 07:57:02 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 08:01:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 08:01:01 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 08:04:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:04:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 08:07:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 08:07:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:07:13 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 08:07:13 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 04 08:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 08:11:25 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 08:15:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:15:36 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 08:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 08:17:14 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 04 08:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 08:21:52 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 04 08:23:36 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b1a5a850 x1638903930249632/t0(0) o3->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:11/0 lens 488/440 e 1 to 0 dl 1564932221 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 08:23:36 fir-md1-s1 kernel: Lustre: 22427:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 04 08:25:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:25:49 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 08:27:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 08:27:22 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 04 08:31:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:31:40 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 08:31:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 08:31:55 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 08:34:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:34:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 08:37:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:37:02 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 04 08:37:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 08:37:24 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 04 08:38:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:42:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 08:42:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 08:46:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 08:47:38 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 04 08:48:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:48:17 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 08:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 08:52:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 08:58:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 08:58:02 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 08:58:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 08:58:32 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 04 08:59:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 08:59:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 09:01:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ddb614800, cur 1564934509 expire 1564934359 last 1564934282 Aug 04 09:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 09:02:24 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 09:08:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 09:08:27 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 04 09:08:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 09:08:35 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 04 09:10:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 09:12:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 09:12:55 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 09:18:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 09:18:30 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 04 09:18:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 09:18:55 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 09:21:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 09:21:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 09:22:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20d5ca3c00, cur 1564935737 expire 1564935587 last 1564935510 Aug 04 09:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 09:23:42 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 09:28:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 09:28:39 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 04 09:29:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 09:29:23 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 09:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 09:33:57 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 09:38:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 09:38:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 09:38:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 09:38:48 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 04 09:42:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f50515800, cur 1564936924 expire 1564936774 last 1564936697 Aug 04 09:42:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 09:42:46 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 09:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 09:44:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 09:48:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 09:49:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 09:49:23 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 04 09:53:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 09:53:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 09:54:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 09:54:14 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 09:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 09:59:29 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 04 10:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f162c4bf400, cur 1564938174 expire 1564938024 last 1564937947 Aug 04 10:04:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 10:04:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 10:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 10:04:43 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 04 10:04:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 04 10:04:55 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 10:09:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 10:09:45 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 10:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 10:14:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 10:15:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 10:15:07 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 10:19:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 10:19:51 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 04 10:23:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 10:23:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 10:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 10:25:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 10:26:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 10:26:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 04 10:29:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 298803cf-e753-941c-6112-fbcd4a68f381 (at 10.8.27.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2505887800, cur 1564939791 expire 1564939641 last 1564939564 Aug 04 10:29:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 10:29:57 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 04 10:35:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 10:35:09 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 10:35:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 10:35:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 10:37:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 10:37:04 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 04 10:40:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 10:40:00 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 04 10:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 10:45:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 10:49:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 10:49:11 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 10:50:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 10:50:02 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 10:50:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 10:50:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 10:55:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 10:55:12 fir-md1-s1 kernel: Lustre: Skipped 583 previous similar messages Aug 04 11:00:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 11:00:13 fir-md1-s1 kernel: Lustre: Skipped 626 previous similar messages Aug 04 11:00:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 11:00:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 11:01:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 11:01:15 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 04 11:05:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 11:05:17 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 11:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 11:10:53 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 11:11:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 11:11:52 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 11:12:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 11:12:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 11:15:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 11:15:55 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 11:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 11:20:55 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 11:22:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 11:22:10 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 11:23:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 11:23:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 11:26:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 11:26:19 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 11:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 11:31:04 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 11:32:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 11:32:12 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 11:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 11:36:37 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 11:38:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 11:38:35 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 04 11:41:15 fir-md1-s1 kernel: Lustre: 10147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564944068/real 1564944068] req@ffff8f3901003300 x1636754912462832/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564944075 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 11:41:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 11:41:42 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 04 11:42:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 11:42:35 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 11:42:47 fir-md1-s1 kernel: Lustre: 10304:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0cab617200 x1637990618021776/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1564944172 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 11:44:08 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564944241/real 1564944241] req@ffff8f0d98a90c00 x1636754913261344/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564944248 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 11:47:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 11:47:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 11:49:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 11:49:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 11:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 11:52:05 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 04 11:54:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 11:54:36 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 11:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 11:57:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 04 12:02:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 12:02:05 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 12:07:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 12:07:41 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 12:08:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 12:08:10 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 04 12:10:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 12:10:23 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 12:12:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 12:12:11 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 12:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23b05ce800, cur 1564946105 expire 1564945955 last 1564945878 Aug 04 12:15:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 04 12:17:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 12:17:49 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 04 12:21:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 12:21:12 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 12:22:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 12:22:16 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 04 12:23:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 12:23:49 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 04 12:28:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 12:28:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 12:31:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 12:31:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 12:32:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 12:32:17 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 12:33:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 12:33:50 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 12:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 12:38:40 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 12:43:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 12:43:09 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 04 12:45:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 12:45:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 12:45:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 12:45:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 12:48:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 12:48:43 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 12:53:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 12:53:11 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 04 12:55:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 12:55:20 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 12:59:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 12:59:18 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 13:03:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 13:03:30 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 13:03:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 13:03:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 13:06:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 13:06:04 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 13:09:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 13:09:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 13:13:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 13:13:36 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 04 13:15:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 13:15:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 13:16:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 13:16:43 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 13:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 13:19:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 13:23:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 13:23:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 13:27:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 13:27:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 04 13:28:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 13:28:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 13:29:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 13:29:33 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 04 13:33:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 13:33:55 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 13:37:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 13:37:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 04 13:38:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 13:38:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 04 13:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 13:39:44 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 13:44:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 13:44:03 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 04 13:49:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 13:49:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 13:50:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 13:50:30 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 13:53:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 13:53:54 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 04 13:54:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 13:54:09 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 04 14:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 14:00:04 fir-md1-s1 kernel: Lustre: Skipped 54113 previous similar messages Aug 04 14:00:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 14:00:36 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 14:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 14:04:21 fir-md1-s1 kernel: Lustre: Skipped 54132 previous similar messages Aug 04 14:06:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 14:06:57 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 04 14:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 14:10:53 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 14:14:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 14:14:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 14:14:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 14:14:25 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 04 14:20:25 fir-md1-s1 kernel: Lustre: 97638:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564953618/real 1564953618] req@ffff8f2021c6e900 x1636754963654128/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564953625 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 14:20:34 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564953627/real 1564953627] req@ffff8f234a60f800 x1636754963733456/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564953634 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 14:20:42 fir-md1-s1 kernel: Lustre: 24584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f20bb212700 x1631681842113360/t0(0) o101->a9d5fe7d-8779-d05a-7912-13f3cb67d95f@10.8.22.28@o2ib6:17/0 lens 480/568 e 1 to 0 dl 1564953647 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 14:21:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 14:21:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 14:22:06 fir-md1-s1 kernel: Lustre: 21447:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564953719/real 1564953719] req@ffff8f221af69200 x1636754964626368/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564953726 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 14:22:13 fir-md1-s1 kernel: Lustre: 22004:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564953726/real 1564953726] req@ffff8f2021d5d700 x1636754964696144/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564953733 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 14:22:21 fir-md1-s1 kernel: Lustre: 26255:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d69b25a00 x1638544403071520/t0(0) o36->1890d675-ce1f-cd8f-dea3-5b5821d43c68@10.8.0.67@o2ib6:26/0 lens 512/2888 e 1 to 0 dl 1564953746 ref 2 fl Interpret:/0/0 rc 0/0 Aug 04 14:22:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1ea5e92d00/0x5d9ee6acc8e07358 lrc: 3/0,0 mode: CR/CR res: [0x200029f8e:0x1:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226913a83 expref: 41 pid: 20722 timeout: 4068815 lvb_type: 0 Aug 04 14:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 14:24:37 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 04 14:25:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 14:25:00 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 14:29:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 14:29:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 14:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 14:31:13 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 04 14:34:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 14:34:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 14:37:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 14:37:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 04 14:41:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 14:41:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 14:44:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 14:45:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 14:45:08 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 14:47:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 14:47:59 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 04 14:51:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 14:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 14:52:02 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 04 14:55:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 14:55:24 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 04 15:00:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 15:00:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:00:39 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 15:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 15:02:29 fir-md1-s1 kernel: Lustre: Skipped 507 previous similar messages Aug 04 15:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 15:05:35 fir-md1-s1 kernel: Lustre: Skipped 513 previous similar messages Aug 04 15:12:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:12:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 04 15:12:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 15:12:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 15:13:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 15:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 15:15:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 15:22:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 15:22:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:22:32 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 15:22:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 15:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 15:26:04 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 04 15:32:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 15:32:45 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 15:33:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:33:27 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 15:36:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 15:36:10 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 04 15:42:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 15:42:45 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 04 15:45:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:45:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 04 15:46:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 15:46:10 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 15:49:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 15:52:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 15:52:50 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 15:55:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 15:55:22 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 15:56:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 15:56:21 fir-md1-s1 kernel: Lustre: Skipped 482 previous similar messages Aug 04 16:03:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 16:03:02 fir-md1-s1 kernel: Lustre: Skipped 434 previous similar messages Aug 04 16:06:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 16:06:27 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 16:06:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 16:06:27 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 04 16:12:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 16:13:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 16:13:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 04 16:15:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 16:16:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 04 16:16:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 04 16:16:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 16:16:31 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 16:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 16:23:21 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 16:26:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 16:26:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 16:26:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 04 16:26:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 16:26:35 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 16:33:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 16:33:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 16:37:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 16:37:06 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 04 16:38:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 16:38:57 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 16:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 16:44:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 16:44:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 16:47:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 16:47:11 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 04 16:50:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 16:50:59 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 16:54:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 16:54:24 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 16:56:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 16:57:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 16:57:12 fir-md1-s1 kernel: Lustre: Skipped 336 previous similar messages Aug 04 17:01:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:01:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 17:01:43 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 04 17:04:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 17:04:29 fir-md1-s1 kernel: Lustre: Skipped 315 previous similar messages Aug 04 17:07:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 17:07:14 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 04 17:09:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:11:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:11:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 17:11:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 04 17:12:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:13:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 17:14:41 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 17:17:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 17:17:22 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 04 17:23:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 17:23:11 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 04 17:24:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 17:24:44 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 04 17:27:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 17:27:34 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 04 17:30:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 17:34:45 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 04 17:34:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 17:34:49 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 04 17:37:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 17:37:39 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Aug 04 17:44:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 17:44:58 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 04 17:46:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 17:46:01 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 04 17:47:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 17:47:47 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 04 17:55:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 17:55:12 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 17:57:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 17:58:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 17:58:01 fir-md1-s1 kernel: Lustre: Skipped 291 previous similar messages Aug 04 17:58:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 17:58:17 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 18:02:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:03:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:05:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 18:05:37 fir-md1-s1 kernel: Lustre: Skipped 249 previous similar messages Aug 04 18:08:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 18:08:21 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 04 18:08:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 18:08:21 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 04 18:09:51 fir-md1-s1 kernel: Lustre: 21371:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564967384/real 1564967384] req@ffff8f326e5a5700 x1636755043488576/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564967391 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:09:51 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564967384/real 1564967384] req@ffff8f293f294e00 x1636755043488496/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564967391 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:12:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 18:15:38 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 18:18:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 18:18:47 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 18:18:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 18:18:47 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 04 18:19:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:20:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:20:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:21:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:24:59 fir-md1-s1 kernel: Lustre: 23750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564968292/real 1564968292] req@ffff8f326e5a0600 x1636755048142800/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564968299 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:25:06 fir-md1-s1 kernel: Lustre: 23750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564968299/real 1564968299] req@ffff8f326e5a0600 x1636755048146912/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564968306 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:25:41 fir-md1-s1 kernel: Lustre: 23695:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564968334/real 1564968334] req@ffff8f2e7c501200 x1636755048171712/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564968341 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:25:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 18:25:50 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 18:26:02 fir-md1-s1 kernel: Lustre: 23750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564968355/real 1564968355] req@ffff8f2661b9bf00 x1636755048189888/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564968362 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 04 18:29:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 18:29:01 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 04 18:29:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 18:29:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 18:36:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 18:36:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 18:39:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 18:39:50 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 18:41:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 18:41:25 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 04 18:41:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:46:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 18:46:47 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 18:49:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 18:49:54 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 04 18:52:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 18:52:11 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 04 18:53:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:55:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 18:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 18:56:58 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 04 18:59:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 18:59:56 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 04 19:00:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:02:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 19:02:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 04 19:02:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15e8f46400, cur 1564970553 expire 1564970403 last 1564970326 Aug 04 19:07:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 19:07:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 19:08:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:08:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:09:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 19:09:58 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 04 19:10:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:13:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 19:13:05 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 19:16:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:17:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 19:17:31 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 19:18:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:20:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 19:20:12 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 04 19:25:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 19:25:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 04 19:27:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 19:27:32 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 19:30:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 19:30:14 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 04 19:35:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 19:35:19 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 04 19:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 19:37:34 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 19:40:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 19:40:14 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 04 19:46:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 19:46:20 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 04 19:48:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 19:48:17 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 19:50:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 19:50:23 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 04 19:57:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 19:57:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 19:58:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 19:58:26 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 19:58:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 19:58:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 20:00:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 20:00:27 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 04 20:07:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 20:07:30 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 04 20:08:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 20:08:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 20:10:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 20:10:34 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 04 20:18:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 20:18:30 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 20:18:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 20:18:50 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 20:19:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 20:20:41 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 04 20:29:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 20:29:34 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 20:29:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 20:29:47 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 04 20:31:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 20:31:16 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 20:33:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 20:39:39 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 04 20:40:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 04 20:40:00 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 04 20:41:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 20:41:54 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 04 20:42:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:49:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:49:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 20:49:41 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 20:50:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 20:50:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 20:51:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 20:51:56 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 04 20:52:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:53:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16a38b0000, cur 1564977219 expire 1564977069 last 1564976992 Aug 04 20:53:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:54:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:56:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 20:57:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 21:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 21:00:02 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 04 21:02:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 21:02:05 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 04 21:02:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 21:02:05 fir-md1-s1 kernel: Lustre: Skipped 497 previous similar messages Aug 04 21:06:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f308c9efc00, cur 1564978012 expire 1564977862 last 1564977785 Aug 04 21:08:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 21:08:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 04 21:10:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 21:10:19 fir-md1-s1 kernel: Lustre: Skipped 462 previous similar messages Aug 04 21:12:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 21:12:19 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 04 21:12:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 21:12:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 21:20:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 21:20:36 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 04 21:21:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 21:22:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 21:22:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 21:22:28 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 04 21:22:28 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 04 21:23:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 21:30:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 21:30:42 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 21:32:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 21:32:49 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 04 21:32:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 21:32:49 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 04 21:39:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 21:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 04 21:40:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 21:43:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 21:43:00 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 21:43:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 04 21:43:00 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 04 21:51:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 21:51:15 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 21:53:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 21:53:04 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 04 21:53:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 21:53:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 22:01:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 22:01:15 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 22:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 04 22:03:04 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 04 22:03:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:03:10 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 22:03:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 22:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 22:11:15 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 22:11:54 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 04 22:13:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:13:26 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 04 22:13:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 22:13:26 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 04 22:21:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 22:21:18 fir-md1-s1 kernel: Lustre: Skipped 17396 previous similar messages Aug 04 22:23:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 22:24:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 22:24:03 fir-md1-s1 kernel: Lustre: Skipped 17460 previous similar messages Aug 04 22:24:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:24:43 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 22:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 22:31:27 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 04 22:34:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 22:34:03 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 04 22:35:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:35:11 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 04 22:41:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 22:41:40 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 04 22:44:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 04 22:44:03 fir-md1-s1 kernel: Lustre: Skipped 124113 previous similar messages Aug 04 22:45:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:45:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 22:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 04 22:52:02 fir-md1-s1 kernel: Lustre: Skipped 124069 previous similar messages Aug 04 22:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 22:54:04 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 04 22:56:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 22:56:18 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 04 23:00:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:01:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:02:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 23:02:05 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 04 23:02:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:03:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 23:04:06 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 04 23:04:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:06:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:07:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 23:07:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 04 23:09:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:12:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 23:12:07 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 04 23:14:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 04 23:14:17 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 04 23:15:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2199fdd000, cur 1564985754 expire 1564985604 last 1564985527 Aug 04 23:17:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:19:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 04 23:19:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 04 23:20:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 23:22:33 fir-md1-s1 kernel: Lustre: Skipped 67952 previous similar messages Aug 04 23:24:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 23:24:18 fir-md1-s1 kernel: Lustre: Skipped 67976 previous similar messages Aug 04 23:25:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:25:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 04 23:29:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 23:29:53 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 04 23:32:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 04 23:32:35 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 23:34:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 23:34:55 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 04 23:38:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:41:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 23:41:52 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 04 23:43:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 04 23:43:10 fir-md1-s1 kernel: Lustre: Skipped 42362 previous similar messages Aug 04 23:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 23:45:07 fir-md1-s1 kernel: Lustre: Skipped 42389 previous similar messages Aug 04 23:53:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 04 23:53:15 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 04 23:53:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 04 23:53:19 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 04 23:53:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 04 23:53:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 04 23:55:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 04 23:55:09 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 00:03:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:03:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 00:03:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 00:03:21 fir-md1-s1 kernel: Lustre: Skipped 125 previous similar messages Aug 05 00:04:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 00:04:17 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 05 00:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 00:05:10 fir-md1-s1 kernel: Lustre: Skipped 143 previous similar messages Aug 05 00:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 00:13:37 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 00:14:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 00:14:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 00:15:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 00:15:12 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 00:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 00:23:41 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 00:24:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:24:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 00:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 00:25:29 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 05 00:26:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:27:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 00:27:41 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 05 00:32:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:34:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 00:34:06 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 00:35:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 00:35:37 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 05 00:38:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:38:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 00:38:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 00:38:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 00:44:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 00:44:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 00:46:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 00:46:03 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 05 00:49:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 00:49:12 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 00:52:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 00:52:02 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 00:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 00:54:49 fir-md1-s1 kernel: Lustre: Skipped 29321 previous similar messages Aug 05 00:56:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 00:56:44 fir-md1-s1 kernel: Lustre: Skipped 29350 previous similar messages Aug 05 00:59:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 05 00:59:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 01:04:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 01:04:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 01:04:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 01:04:59 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 01:06:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 01:06:55 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 05 01:09:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 01:09:25 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 05 01:14:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 01:14:50 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 01:16:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 01:16:14 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 01:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 01:18:31 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 05 01:20:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 01:20:18 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 05 01:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 01:26:15 fir-md1-s1 kernel: Lustre: Skipped 28843 previous similar messages Aug 05 01:28:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 01:28:34 fir-md1-s1 kernel: Lustre: Skipped 28882 previous similar messages Aug 05 01:30:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 01:30:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 01:30:24 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 05 01:36:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 01:36:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 01:38:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 01:38:38 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 05 01:40:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 01:40:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 01:40:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 01:40:54 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 05 01:47:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 01:47:06 fir-md1-s1 kernel: Lustre: Skipped 43464 previous similar messages Aug 05 01:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 01:49:11 fir-md1-s1 kernel: Lustre: Skipped 43505 previous similar messages Aug 05 01:51:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 01:51:38 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 01:52:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 01:52:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 01:57:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 01:57:26 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 05 01:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 01:59:58 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 05 02:03:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:03:14 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 02:07:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 02:07:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 02:07:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 02:07:30 fir-md1-s1 kernel: Lustre: Skipped 29051 previous similar messages Aug 05 02:10:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 02:10:32 fir-md1-s1 kernel: Lustre: Skipped 29084 previous similar messages Aug 05 02:13:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:13:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 02:17:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 02:17:34 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 02:20:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 02:20:34 fir-md1-s1 kernel: Lustre: Skipped 14165 previous similar messages Aug 05 02:25:45 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564997138/real 1564997138] req@ffff8f2fc84faa00 x1636755168651152/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564997145 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 02:25:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:25:50 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 05 02:25:52 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564997145/real 1564997145] req@ffff8f2fc84faa00 x1636755168651152/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564997152 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 05 02:25:53 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f23823f00 x1631766254756016/t0(0) o101->c6e7a245-976c-f1da-2930-5dafca10acda@10.8.31.8@o2ib6:28/0 lens 480/568 e 1 to 0 dl 1564997158 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 02:25:59 fir-md1-s1 kernel: Lustre: 23740:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564997152/real 1564997152] req@ffff8f2fc84faa00 x1636755168651152/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1564997159 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 05 02:28:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 02:28:06 fir-md1-s1 kernel: Lustre: Skipped 14118 previous similar messages Aug 05 02:30:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 02:30:42 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 05 02:36:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:36:51 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 05 02:38:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 02:38:18 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 02:38:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 02:38:36 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 02:41:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 02:41:07 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 05 02:46:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:46:55 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 02:49:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 02:49:00 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 02:51:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 02:51:12 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 05 02:58:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 02:58:03 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 05 02:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 02:59:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 03:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 03:01:16 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 05 03:04:34 fir-md1-s1 kernel: Lustre: 10148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564999467/real 1564999467] req@ffff8f278a3eda00 x1636755178466160/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1564999474 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 05 03:07:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:08:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 03:08:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 03:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 03:10:13 fir-md1-s1 kernel: Lustre: Skipped 369 previous similar messages Aug 05 03:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 03:11:19 fir-md1-s1 kernel: Lustre: Skipped 381 previous similar messages Aug 05 03:13:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 91598d05-70f1-2125-3b77-5fd61a214bf1 (at 10.9.102.66@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45191ba000, cur 1565000039 expire 1564999889 last 1564999812 Aug 05 03:14:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:15:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:18:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 03:18:17 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 03:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 03:20:21 fir-md1-s1 kernel: Lustre: Skipped 39219 previous similar messages Aug 05 03:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 03:21:43 fir-md1-s1 kernel: Lustre: Skipped 39271 previous similar messages Aug 05 03:30:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 03:30:38 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 05 03:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 03:31:09 fir-md1-s1 kernel: Lustre: Skipped 10623 previous similar messages Aug 05 03:32:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 03:32:02 fir-md1-s1 kernel: Lustre: Skipped 10622 previous similar messages Aug 05 03:33:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:34:52 fir-md1-s1 kernel: LustreError: 21540:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f25313ef050 x1631353677291024/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:10/0 lens 488/448 e 1 to 0 dl 1565001310 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 03:34:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 05 03:40:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 03:40:44 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 03:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 03:41:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 03:42:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 03:42:04 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 03:48:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:49:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:50:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:50:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 03:50:56 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 03:51:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 03:51:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 03:51:28 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 03:52:02 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f280efebc00 x1634934840482640/t356712818496(0) o36->8f367c70-6bbd-359c-a9cb-016bde9e7ec3@10.8.27.12@o2ib6:7/0 lens 488/3152 e 1 to 0 dl 1565002327 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 03:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0204f910-8cc4-cf63-febe-cbb4c5835fb5 (at 10.8.27.12@o2ib6) Aug 05 03:52:08 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 05 03:52:16 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0864c2de80/0x5d9ee6ade0fe1870 lrc: 3/0,0 mode: PR/PR res: [0x2c002bf23:0x36dc:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a2269226df expref: 257 pid: 21127 timeout: 4117396 lvb_type: 0 Aug 05 04:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 04:01:55 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 05 04:02:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 04:02:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 04:02:23 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 04:10:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:11:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 04:11:56 fir-md1-s1 kernel: Lustre: Skipped 13307 previous similar messages Aug 05 04:12:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 04:12:25 fir-md1-s1 kernel: Lustre: Skipped 13332 previous similar messages Aug 05 04:12:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:12:26 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 04:14:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:21:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 04:21:59 fir-md1-s1 kernel: Lustre: Skipped 111843 previous similar messages Aug 05 04:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 04:22:27 fir-md1-s1 kernel: Lustre: Skipped 111895 previous similar messages Aug 05 04:22:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:22:33 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 05 04:32:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 04:32:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 04:32:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 04:32:28 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 05 04:32:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:32:39 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 04:35:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:36:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:38:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:39:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 04:42:59 fir-md1-s1 kernel: Lustre: Skipped 12743 previous similar messages Aug 05 04:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 04:42:59 fir-md1-s1 kernel: Lustre: Skipped 12778 previous similar messages Aug 05 04:43:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:43:30 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 04:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 04:53:12 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 04:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 04:53:12 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 05 04:54:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:55:31 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f182216e000 x1639154469645648/t0(0) o101->ec7203e3-70bf-29e9-bb08-bb4d33e58ceb@10.9.104.25@o2ib4:6/0 lens 1792/3288 e 1 to 0 dl 1565006136 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 04:55:31 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 05 04:55:45 fir-md1-s1 kernel: Lustre: 20732:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2a0214ec00 x1631616111735824/t0(0) o101->2760e021-c1fe-d2a9-3b01-eeefd52010e6@10.8.7.5@o2ib6:20/0 lens 584/3264 e 0 to 0 dl 1565006150 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 04:55:45 fir-md1-s1 kernel: Lustre: 20732:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 05 04:55:46 fir-md1-s1 kernel: Lustre: 23645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f34807f3c00 x1631705848295792/t0(0) o101->acb1aa3b-60ab-7f7c-ec38-03838117cd24@10.8.25.12@o2ib6:21/0 lens 584/3264 e 0 to 0 dl 1565006151 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 04:55:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f27da99b840/0x5d9ee6adf95027f2 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 98 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226924f18 expref: 62 pid: 10146 timeout: 4121206 lvb_type: 0 Aug 05 04:56:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:56:38 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 04:57:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 04:58:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 04:58:46 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 05 04:59:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 05:03:25 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 05:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 05:03:25 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 05 05:03:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:09:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 05:09:29 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 05 05:10:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a408bd94-ffaa-278d-8988-76468f5d0876 (at 10.8.16.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b7e7e1c00, cur 1565007001 expire 1565006851 last 1565006774 Aug 05 05:10:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 05 05:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4be13f91-94ff-43a7-d4ac-0956b3c28c36 (at 10.8.16.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f0f80f000, cur 1565007013 expire 1565006863 last 1565006786 Aug 05 05:11:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:11:53 fir-md1-s1 kernel: Lustre: 20511:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565007106/real 1565007106] req@ffff8f1f300c9500 x1636755210173136/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565007113 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 05:12:00 fir-md1-s1 kernel: Lustre: 20511:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565007113/real 1565007113] req@ffff8f1f300c9500 x1636755210173136/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565007120 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 05 05:12:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:13:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 05:13:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:13:38 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 05 05:13:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 05:13:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 05:19:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 05 05:19:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 05:20:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 05:23:43 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 05:23:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 05:23:49 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 05 05:29:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 05 05:29:47 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 05 05:33:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 05:33:44 fir-md1-s1 kernel: Lustre: Skipped 14195 previous similar messages Aug 05 05:33:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 05:33:54 fir-md1-s1 kernel: Lustre: Skipped 14171 previous similar messages Aug 05 05:35:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:35:30 fir-md1-s1 kernel: LustreError: 24569:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f350c67d450 x1632261252640112/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:18/0 lens 504/448 e 1 to 0 dl 1565008548 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 05:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Aug 05 05:39:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 05:39:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 05:43:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 05:43:46 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 05 05:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 05:44:08 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 05:49:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 05:49:51 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 05 05:51:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:53:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 05:53:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 05:53:50 fir-md1-s1 kernel: Lustre: Skipped 41247 previous similar messages Aug 05 05:54:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 05:54:33 fir-md1-s1 kernel: Lustre: Skipped 41209 previous similar messages Aug 05 05:56:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:00:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 05 06:00:29 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 06:00:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 06:04:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 06:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 06:04:49 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 06:07:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:09:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:09:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 06:10:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 06:10:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 05 06:11:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 06:15:35 fir-md1-s1 kernel: Lustre: Skipped 2723 previous similar messages Aug 05 06:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 06:15:35 fir-md1-s1 kernel: Lustre: Skipped 2743 previous similar messages Aug 05 06:16:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:16:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 06:21:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 05 06:21:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 06:24:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f098a800, cur 1565011492 expire 1565011342 last 1565011265 Aug 05 06:24:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 06:25:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:25:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 06:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 06:25:48 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 06:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 06:25:48 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 05 06:33:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 06:33:38 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 06:35:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 06:35:56 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 05 06:35:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 06:35:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 05 06:43:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 06:43:41 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 06:45:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 06:45:58 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Aug 05 06:46:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 06:46:05 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 06:49:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdbe3400, cur 1565012948 expire 1565012798 last 1565012721 Aug 05 06:50:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:53:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 06:53:49 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 06:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 06:56:00 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 05 06:56:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 06:56:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 06:56:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 06:56:37 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 07:00:51 fir-md1-s1 kernel: LustreError: 21716:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f25313ef050 x1631686857280528/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:26/0 lens 504/448 e 0 to 0 dl 1565013656 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 07:00:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Aug 05 07:05:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 05 07:05:14 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 05 07:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 07:06:01 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 07:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 07:06:40 fir-md1-s1 kernel: Lustre: Skipped 8166 previous similar messages Aug 05 07:10:24 fir-md1-s1 kernel: LustreError: 48201:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f350c67d450 x1631686857349536/t0(0) o4->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:29/0 lens 504/448 e 0 to 0 dl 1565014229 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 07:10:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6), client will retry: rc = -110 Aug 05 07:10:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:11:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:16:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 07:16:13 fir-md1-s1 kernel: Lustre: Skipped 13825 previous similar messages Aug 05 07:16:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 05 07:16:20 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 07:16:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 07:16:40 fir-md1-s1 kernel: Lustre: Skipped 5675 previous similar messages Aug 05 07:20:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:21:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:25:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:25:39 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 07:26:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 07:26:23 fir-md1-s1 kernel: Lustre: Skipped 22002 previous similar messages Aug 05 07:26:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 07:26:50 fir-md1-s1 kernel: Lustre: Skipped 21980 previous similar messages Aug 05 07:27:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 07:27:21 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 05 07:32:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:32:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 07:36:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 07:36:44 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 05 07:36:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 07:36:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 07:39:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 05 07:39:28 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 05 07:39:32 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 05 07:44:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f6bf0b400, cur 1565016274 expire 1565016124 last 1565016047 Aug 05 07:44:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:44:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 05 07:46:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 07:46:44 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 07:46:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 07:46:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 07:49:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 07:49:39 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 07:54:55 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f402dbcb000, cur 1565016895 expire 1565016745 last 1565016668 Aug 05 07:56:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 07:56:37 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 07:57:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 07:57:05 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 07:57:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 07:57:10 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 08:01:44 fir-md1-s1 kernel: Lustre: 27321:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f16f4e1ad00 x1639519824921520/t0(0) o101->04c17dce-45f1-fe7e-2627-7efeaaeaddb9@10.9.0.62@o2ib4:19/0 lens 480/568 e 1 to 0 dl 1565017309 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 08:01:44 fir-md1-s1 kernel: Lustre: 27321:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 05 08:01:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f3268d40fc0/0x5d9ee6ae1ff84f10 lrc: 3/0,0 mode: PW/PW res: [0x20001254e:0x7:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85bc4100 expref: 99 pid: 10585 timeout: 4132378 lvb_type: 0 Aug 05 08:02:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 08:02:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 08:07:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:07:48 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 08:07:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 08:07:48 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 08:08:25 fir-md1-s1 kernel: Lustre: 21671:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ae0aabf00 x1638775364098624/t0(0) o101->63331ef8-e7f6-019b-65ae-b4aad7ec4d2c@10.8.14.7@o2ib6:0/0 lens 576/3264 e 1 to 0 dl 1565017710 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 08:08:43 fir-md1-s1 kernel: Lustre: 23613:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3e1ae8fb00 x1639301898993408/t0(0) o36->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:18/0 lens 560/2888 e 0 to 0 dl 1565017728 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 08:08:43 fir-md1-s1 kernel: Lustre: 23613:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 05 08:12:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 08:12:52 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 05 08:16:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 08:16:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 08:17:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:17:59 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 05 08:17:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 08:17:59 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 08:20:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0bacb87800, cur 1565018407 expire 1565018257 last 1565018180 Aug 05 08:24:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 08:24:11 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 05 08:28:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 08:28:20 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 05 08:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:28:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 05 08:30:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 08:30:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 08:38:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 08:38:31 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 08:38:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 08:38:31 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 08:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:38:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 08:41:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 08:41:28 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 08:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 08:48:34 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 05 08:48:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 08:48:36 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 05 08:49:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:49:13 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 08:53:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 08:53:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 05 08:59:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 08:59:37 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 08:59:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 08:59:37 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 05 09:00:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 05 09:00:17 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 09:06:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 09:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 09:10:16 fir-md1-s1 kernel: Lustre: Skipped 618 previous similar messages Aug 05 09:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 09:10:16 fir-md1-s1 kernel: Lustre: Skipped 641 previous similar messages Aug 05 09:11:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 05 09:11:09 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 09:11:49 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1c08ab1f80/0x5d9ee6ae33d75ee2 lrc: 3/0,0 mode: PR/PR res: [0x2c002c7cc:0x14:0x0].0x0 bits 0x58/0x0 rrc: 3 type: IBT flags: 0x60200400010020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22692c9d7 expref: 51 pid: 22284 timeout: 4136569 lvb_type: 0 Aug 05 09:13:07 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18749ba100 x1631353690842032/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:12/0 lens 376/1600 e 1 to 0 dl 1565021592 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 09:14:10 fir-md1-s1 kernel: Lustre: 46535:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f19659f7050 x1631353690842192/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:15/0 lens 488/448 e 1 to 0 dl 1565021655 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 09:14:15 fir-md1-s1 kernel: LustreError: 22157:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f19659f7050 x1631353690842192/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:15/0 lens 488/448 e 1 to 0 dl 1565021655 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 09:14:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 05 09:17:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 09:17:00 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 09:20:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 09:20:37 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 05 09:20:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 09:20:55 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 05 09:21:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 09:21:10 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 09:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 09:31:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 09:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 09:31:31 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 05 09:31:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 09:31:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 09:31:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 05 09:31:54 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 05 09:41:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec76f1db-9c9b-bbe0-847f-90a9d517c8dc (at 10.8.9.8@o2ib6) Aug 05 09:41:49 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 05 09:42:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 09:42:03 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 09:42:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 09:42:12 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 09:42:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 09:42:39 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 09:47:00 fir-md1-s1 kernel: Lustre: 10146:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565023613/real 1565023613] req@ffff8f2b636fb300 x1636755283023920/t0(0) o106->fir-MDT0002@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565023620 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 09:52:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 09:52:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 09:52:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 09:52:09 fir-md1-s1 kernel: Lustre: Skipped 3081 previous similar messages Aug 05 09:52:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 09:52:51 fir-md1-s1 kernel: Lustre: Skipped 3034 previous similar messages Aug 05 09:56:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 09:56:48 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 05 10:02:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 10:02:14 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 10:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 10:04:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 05 10:04:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 10:04:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 05 10:10:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 10:10:49 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 05 10:12:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 10:12:30 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 05 10:14:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 10:14:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 05 10:14:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 10:14:22 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 10:17:22 fir-md1-s1 kernel: Lustre: 23607:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2c7f349e00 x1640413387250944/t0(0) o101->b547cf7a-3ab0-2e41-c4cc-76850cd91e64@10.8.10.34@o2ib6:27/0 lens 1776/3288 e 1 to 0 dl 1565025447 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 10:17:22 fir-md1-s1 kernel: Lustre: 23607:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 05 10:17:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2283370fc0/0x5d9ee6ae4cc94340 lrc: 3/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a22692eeba expref: 47 pid: 22285 timeout: 4140516 lvb_type: 0 Aug 05 10:22:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 10:22:31 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 05 10:23:27 fir-md1-s1 kernel: LustreError: 46591:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f25313ec450 x1632261255460752/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:2/0 lens 488/448 e 0 to 0 dl 1565025812 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 10:23:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Aug 05 10:24:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 10:24:14 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 10:24:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 10:24:24 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 10:27:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 10:27:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 10:28:02 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2f21c2b850 x1632261255498000/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:7/0 lens 488/448 e 0 to 0 dl 1565026087 ref 1 fl Interpret:/0/0 rc 0/0 Aug 05 10:28:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Aug 05 10:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 10:32:36 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 10:35:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 10:35:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 05 10:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 10:35:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 10:40:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 10:40:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 10:43:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 10:43:09 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 10:45:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 10:45:16 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 05 10:45:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 10:45:30 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 10:53:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 10:53:28 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 05 10:53:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d2d357800, cur 1565027638 expire 1565027488 last 1565027411 Aug 05 10:54:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 10:54:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 10:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 10:55:40 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 05 10:56:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 05 10:56:08 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 11:03:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 11:03:41 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 05 11:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 11:06:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 11:06:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 05 11:06:55 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 11:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 11:13:54 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 05 11:14:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:14:25 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 11:16:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 11:16:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 11:17:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 11:17:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 11:21:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:23:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 11:23:54 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 05 11:24:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:24:52 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 11:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 11:27:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 11:27:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 11:27:19 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 11:30:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:30:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 11:33:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 11:33:55 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 11:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 11:37:15 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 11:37:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 11:37:55 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 11:38:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148c7a2800, cur 1565030314 expire 1565030164 last 1565030087 Aug 05 11:40:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518b1a400, cur 1565030400 expire 1565030250 last 1565030173 Aug 05 11:40:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 11:44:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 11:44:00 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 05 11:45:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:45:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 11:46:15 fir-md1-s1 kernel: Lustre: 25076:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565030768/real 1565030768] req@ffff8f2247b04800 x1636755363243616/t0(0) o105->fir-MDT0000@10.8.10.21@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1565030775 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 11:47:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 11:47:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 11:48:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 11:48:44 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 05 11:54:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 11:54:03 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 05 11:55:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 11:55:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 11:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 11:57:56 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 11:59:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 11:59:51 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 05 12:04:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 12:04:07 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 05 12:05:34 fir-md1-s1 kernel: LustreError: 21463:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565031934 with bad export cookie 6746082289092404059 Aug 05 12:06:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 12:06:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 12:08:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 12:08:21 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 05 12:09:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16f9176c00, cur 1565032165 expire 1565032015 last 1565031938 Aug 05 12:10:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a1d65202-4b4c-07d9-f12d-b432157293c9 (at 10.9.115.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e6ab1b800, cur 1565032205 expire 1565032055 last 1565031978 Aug 05 12:10:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 12:11:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 12:11:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 12:14:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 12:14:11 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 05 12:15:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3b3ff1d800, cur 1565032534 expire 1565032384 last 1565032307 Aug 05 12:15:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 12:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4447a88400, cur 1565032697 expire 1565032547 last 1565032470 Aug 05 12:18:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 12:18:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 12:20:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 12:20:24 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 12:21:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 12:21:11 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 12:24:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 12:24:31 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 12:24:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0074f13d-7764-019e-fa05-08395204d95a (at 10.9.112.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2169482800, cur 1565033094 expire 1565032944 last 1565032867 Aug 05 12:28:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16ebc13400, cur 1565033315 expire 1565033165 last 1565033088 Aug 05 12:28:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 05 12:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 12:29:29 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 12:33:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 12:33:19 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 12:34:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 12:34:36 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 12:39:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 12:39:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 12:39:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 12:39:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 12:44:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 12:44:49 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 05 12:47:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d853c3800, cur 1565034443 expire 1565034293 last 1565034216 Aug 05 12:49:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 05 12:49:00 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 12:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 12:49:46 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 12:54:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 12:54:51 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 05 12:55:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 12:55:10 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 05 12:59:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 12:59:01 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 05 13:00:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 13:00:03 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 13:05:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 13:05:01 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 05 13:10:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 13:10:09 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 13:10:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 13:10:19 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 13:13:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 13:13:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 13:13:34 fir-md1-s1 kernel: Lustre: 31009:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565036007/real 1565036007] req@ffff8f30f20a5700 x1636755425525120/t0(0) o105->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1565036014 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 13:15:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 13:15:03 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Aug 05 13:20:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 13:20:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 13:20:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 13:20:23 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 05 13:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 13:25:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 13:25:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 13:25:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 13:29:42 fir-md1-s1 kernel: LustreError: 31016:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565036982 with bad export cookie 6746082929221618861 Aug 05 13:30:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 13:30:14 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 13:30:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 13:30:39 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 13:33:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a98f6c000, cur 1565037209 expire 1565037059 last 1565036982 Aug 05 13:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fab118800, cur 1565037265 expire 1565037115 last 1565037038 Aug 05 13:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 13:35:14 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 05 13:36:10 fir-md1-s1 kernel: LustreError: 20367:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565037370 with bad export cookie 6746082929562193137 Aug 05 13:36:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 13:36:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 13:40:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 13:40:15 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 05 13:40:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 13:40:42 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 13:41:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a3ce28800, cur 1565037684 expire 1565037534 last 1565037457 Aug 05 13:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 13:46:00 fir-md1-s1 kernel: Lustre: Skipped 147337 previous similar messages Aug 05 13:48:58 fir-md1-s1 kernel: Lustre: 21679:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565038131/real 1565038131] req@ffff8f346180c200 x1636755447427264/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565038138 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 13:50:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 13:50:24 fir-md1-s1 kernel: Lustre: Skipped 147310 previous similar messages Aug 05 13:51:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17d7099800, cur 1565038284 expire 1565038134 last 1565038057 Aug 05 13:51:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 13:54:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 13:54:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 13:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b61e01000, cur 1565038442 expire 1565038292 last 1565038215 Aug 05 13:56:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 13:56:02 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 05 13:58:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 13:58:42 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 14:00:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 14:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 14:00:52 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 14:04:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:04:01 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 14:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 14:06:04 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 05 14:07:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bc12ebc00, cur 1565039279 expire 1565039129 last 1565039052 Aug 05 14:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 14:10:56 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 14:15:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:15:24 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 05 14:16:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 14:16:05 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 14:21:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 14:21:12 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 14:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 14:26:06 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 14:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:26:14 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 14:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 14:31:23 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 14:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 14:36:37 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 05 14:37:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:37:04 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 14:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 14:41:26 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 14:45:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 14:45:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 14:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 14:46:51 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 05 14:47:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:47:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 14:48:00 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 05 14:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 14:51:45 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 14:56:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 14:56:59 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 05 14:57:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 14:57:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 15:02:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 15:02:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 15:04:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:07:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 15:07:20 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 15:07:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 15:07:59 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 15:09:02 fir-md1-s1 kernel: LustreError: 30992:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565042942 with bad export cookie 6746082930314741500 Aug 05 15:09:12 fir-md1-s1 kernel: LustreError: 46813:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565042952 with bad export cookie 6746082930178709317 Aug 05 15:11:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:12:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 15:12:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 15:12:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:12:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ada580400, cur 1565043169 expire 1565043019 last 1565042942 Aug 05 15:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c6d98c000, cur 1565043179 expire 1565043029 last 1565042952 Aug 05 15:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 15:17:27 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 05 15:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 15:22:35 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 15:23:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 15:23:30 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 05 15:26:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:27:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 15:27:33 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 05 15:28:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:29:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:32:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 15:32:39 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 15:33:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 15:33:37 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 05 15:35:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:37:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 15:37:39 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 05 15:43:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 15:43:30 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 15:43:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 15:43:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 15:47:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 15:47:55 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 05 15:53:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 15:53:39 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 15:53:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 15:53:45 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 15:56:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:57:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:58:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 15:58:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 15:58:20 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 05 16:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8633279b-50fa-c303-2b0c-21f61d483f5e (at 10.9.109.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252421c800, cur 1565046159 expire 1565046009 last 1565045932 Aug 05 16:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 16:03:48 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 16:04:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:08:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 16:08:23 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 05 16:08:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 16:08:23 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 05 16:09:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:10:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:11:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:11:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 16:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 16:13:58 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 16:18:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 16:18:23 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 05 16:20:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 16:20:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 16:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 16:23:59 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 16:24:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:24:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 16:28:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 16:28:28 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 16:32:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:32:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 16:32:43 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 05 16:33:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 16:34:11 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 05 16:37:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:38:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 16:38:31 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 05 16:42:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 16:42:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 05 16:43:09 fir-md1-s1 kernel: Lustre: 23723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565048582/real 1565048582] req@ffff8f2bafeb6900 x1636755566220352/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565048589 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 16:44:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 16:44:13 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 05 16:45:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:48:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 16:48:35 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 05 16:53:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 16:53:41 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 05 16:54:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 16:54:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 16:54:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 16:57:04 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 05 16:58:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 16:58:46 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 17:02:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 17:04:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 17:04:39 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 17:04:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:04:55 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 05 17:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 17:08:53 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 05 17:14:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 17:14:48 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 17:14:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:14:58 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 17:15:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 17:15:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 17:19:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 17:19:15 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 17:24:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 17:24:51 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 17:25:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:25:00 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 05 17:26:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 17:26:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 17:30:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 17:30:13 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 05 17:34:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 17:34:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 17:35:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:35:18 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 05 17:36:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 17:36:25 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 05 17:40:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b9325182-a28c-587e-2f12-edf11a3d8292 (at 10.9.112.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28280f5c00, cur 1565052013 expire 1565051863 last 1565051786 Aug 05 17:40:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 05 17:40:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 17:40:16 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 05 17:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0eaeb89b-859f-1fc8-d1f0-672563c1d160 (at 10.8.8.24@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff8f213443d000, cur 1565052089 expire 1565051939 last 1565051903 Aug 05 17:41:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 05 17:42:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client eb3c77e9-52e0-867c-c008-e3641e509af1 (at 10.8.8.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cdb314c00, cur 1565052130 expire 1565051980 last 1565051903 Aug 05 17:42:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 05 17:42:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8bcbb71f-dec9-01fd-fa31-3d32f5a62a50 (at 10.8.8.23@o2ib6) in 170 seconds. I think it's dead, and I am evicting it. exp ffff8f2507869000, cur 1565052165 expire 1565052015 last 1565051995 Aug 05 17:42:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 05 17:43:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 908b7e20-ca64-f207-eb7c-7b3d028780cb (at 10.8.8.19@o2ib6) in 188 seconds. I think it's dead, and I am evicting it. exp ffff8f457f837400, cur 1565052206 expire 1565052056 last 1565052018 Aug 05 17:43:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 17:44:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1dccfe10-92fc-f925-ce99-469da8f9fab0 (at 10.8.8.19@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff8f2164872400, cur 1565052241 expire 1565052091 last 1565052018 Aug 05 17:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 17:45:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 17:46:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:46:21 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 05 17:49:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 17:49:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 17:50:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 17:50:38 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 05 17:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15e2428400, cur 1565052863 expire 1565052713 last 1565052636 Aug 05 17:54:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 05 17:55:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 17:55:38 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 05 17:57:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 17:57:56 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 05 18:00:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 18:00:42 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 05 18:07:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 18:07:20 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 18:10:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 05 18:10:15 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 18:11:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 18:11:00 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 05 18:13:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:13:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 18:15:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:15:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 18:16:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16823a0800, cur 1565054207 expire 1565054057 last 1565053980 Aug 05 18:17:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 18:17:29 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 05 18:17:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8bcbb71f-dec9-01fd-fa31-3d32f5a62a50 (at 10.8.8.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2513658800, cur 1565054273 expire 1565054123 last 1565054046 Aug 05 18:18:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:18:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 18:20:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 18:20:16 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 05 18:21:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 18:21:14 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 18:23:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 18:27:33 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 18:30:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 18:30:25 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 18:30:47 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 05 18:30:47 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 05 18:31:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 18:31:28 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 05 18:35:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:35:49 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 05 18:37:05 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.24@o2ib6 arrived at 1565055425 with bad export cookie 6746082379196186919 Aug 05 18:37:10 fir-md1-s1 kernel: LustreError: 31003:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.21@o2ib6 arrived at 1565055430 with bad export cookie 6746082879672274219 Aug 05 18:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 18:37:47 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 05 18:40:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 18:40:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 18:41:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 18:41:35 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 05 18:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 18:48:34 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 18:50:02 fir-md1-s1 kernel: LustreError: 25082:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.19@o2ib6 arrived at 1565056202 with bad export cookie 6746082289091765554 Aug 05 18:50:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 18:50:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 18:51:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 18:51:08 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 18:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 18:51:48 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 05 18:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 18:59:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 19:01:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 19:01:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 19:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 19:01:57 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 19:02:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:02:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 19:11:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 19:11:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 19:12:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 19:12:11 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 05 19:14:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:14:06 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 19:20:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 19:20:57 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 19:21:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 19:21:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 05 19:22:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 19:22:16 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 19:28:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:28:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 05 19:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 19:31:43 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 19:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 19:32:19 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 05 19:32:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 19:32:45 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 19:38:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:38:34 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 19:41:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 19:41:54 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 05 19:42:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 19:42:20 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 05 19:44:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 19:44:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 19:49:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:49:03 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 19:52:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 19:52:04 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 19:53:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 19:53:00 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 05 19:53:54 fir-md1-s1 kernel: Lustre: 97664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565060027/real 1565060027] req@ffff8f1acd949500 x1636755619894144/t0(0) o104->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565060034 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 05 19:59:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 19:59:04 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 05 19:59:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 19:59:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 20:02:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 20:02:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 20:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 20:03:13 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 05 20:10:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 20:10:43 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 20:12:04 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b1a58850 x1634140708755200/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:9/0 lens 488/440 e 1 to 0 dl 1565061129 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 20:12:04 fir-md1-s1 kernel: Lustre: 55010:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 05 20:12:20 fir-md1-s1 kernel: Lustre: 49228:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:11s); client may timeout. req@ffff8f06b1a58850 x1634140708755200/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:9/0 lens 488/408 e 1 to 0 dl 1565061129 ref 1 fl Complete:/0/0 rc 131072/131072 Aug 05 20:12:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 20:12:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 20:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 20:13:22 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 05 20:14:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 20:14:19 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 20:20:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 20:20:47 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 20:22:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 20:22:42 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 20:23:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 20:23:38 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 05 20:27:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 20:27:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 05 20:30:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 20:30:55 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 05 20:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 20:32:51 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 05 20:33:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 20:33:46 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 05 20:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 20:40:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 20:41:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 20:41:42 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 05 20:42:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 20:42:55 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 05 20:43:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 20:43:47 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 05 20:46:36 fir-md1-s1 kernel: Lustre: 81718:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0bb72c6850 x1631330648450192/t0(0) o3->5c9f5376-a105-7e2f-1c52-759657f6fd7d@10.9.101.59@o2ib4:11/0 lens 488/16824 e 1 to 0 dl 1565063201 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 20:51:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 20:51:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 20:52:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 20:52:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 20:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 20:53:12 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 05 20:53:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 20:53:55 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 20:54:47 fir-md1-s1 kernel: Lustre: 52249:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0bb72c4850 x1634140798480656/t0(0) o3->89c5b213-fa16-71ad-d5f3-58d49989ce10@10.9.115.11@o2ib4:22/0 lens 488/440 e 1 to 0 dl 1565063692 ref 2 fl Interpret:/0/0 rc 0/0 Aug 05 21:02:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:02:33 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 21:04:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 21:04:36 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 05 21:04:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 21:04:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 21:06:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:06:29 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 21:12:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:12:55 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 05 21:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 05 21:15:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 05 21:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 05 21:15:31 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 05 21:19:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:19:49 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Aug 05 21:23:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:23:17 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 21:25:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 21:25:55 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 05 21:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 21:26:04 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 05 21:30:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:30:04 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 21:33:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:33:20 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 05 21:36:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 21:36:00 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 05 21:36:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 21:36:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 21:44:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:44:14 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 21:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 05 21:46:18 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 05 21:46:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 21:46:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 21:51:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:51:23 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 21:54:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 21:54:20 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 21:56:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 21:56:37 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 05 21:57:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:57:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 21:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 21:57:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 05 21:59:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 21:59:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 22:04:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:04:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 22:05:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 22:05:53 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 05 22:06:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 22:06:37 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 05 22:07:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 05 22:07:33 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 22:15:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 22:15:56 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 05 22:16:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 22:16:41 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 05 22:18:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 22:18:03 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 05 22:27:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 22:27:11 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 22:27:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 22:27:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 05 22:28:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 05 22:28:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 22:30:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:30:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 22:33:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:33:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 05 22:35:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:35:42 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 22:38:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 22:38:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 05 22:38:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 22:38:32 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 05 22:38:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 22:38:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 05 22:42:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:48:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 05 22:48:34 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 22:48:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 05 22:48:34 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 05 22:49:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 22:49:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 22:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 22:58:47 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 05 22:58:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 22:58:48 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 05 22:59:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 22:59:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 05 22:59:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 22:59:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 23:08:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:08:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 05 23:08:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 23:08:57 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 05 23:09:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 05 23:09:54 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 23:10:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 23:10:31 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 23:18:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:18:58 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 05 23:18:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 23:18:58 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 05 23:20:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 23:20:08 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 23:24:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 23:24:53 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 05 23:29:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:29:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 05 23:29:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 23:29:01 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 05 23:30:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 23:30:30 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 05 23:38:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2dc32fe000, cur 1565073513 expire 1565073363 last 1565073286 Aug 05 23:39:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 23:39:06 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 05 23:39:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:39:15 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 05 23:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 23:40:43 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 05 23:41:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 23:41:41 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 05 23:49:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 05 23:49:10 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 05 23:49:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:49:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 05 23:50:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 05 23:50:54 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 05 23:54:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 05 23:54:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 05 23:59:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 05 23:59:15 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 05 23:59:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 05 23:59:17 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 00:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 00:01:13 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 00:04:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 00:04:22 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 00:09:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 00:09:26 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 06 00:09:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 00:09:46 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 00:11:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 00:11:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 00:18:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24d093c800, cur 1565075901 expire 1565075751 last 1565075674 Aug 06 00:18:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 00:18:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 00:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 00:19:28 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 06 00:19:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 06 00:19:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 00:20:38 fir-md1-s1 kernel: Lustre: 20726:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565076031/real 1565076031] req@ffff8f20c618d100 x1636755670906512/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565076038 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 00:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 00:22:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 00:30:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 00:30:06 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 00:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 00:30:06 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 06 00:32:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 00:32:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 00:34:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 00:34:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 00:39:14 fir-md1-s1 kernel: Lustre: 20729:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565077147/real 1565077147] req@ffff8f1f96566900 x1636755673995744/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565077154 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 00:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 00:40:27 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 06 00:40:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 00:40:54 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 00:44:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 00:44:21 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 00:47:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 00:47:15 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 00:50:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 00:50:33 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 06 00:51:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 00:51:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 00:54:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 00:54:22 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 01:00:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 01:00:34 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 06 01:01:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 01:01:12 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 01:01:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:01:37 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 01:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 01:04:48 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 01:07:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2888287000, cur 1565078850 expire 1565078700 last 1565078623 Aug 06 01:10:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 01:10:38 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 06 01:11:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 01:11:41 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 01:12:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:12:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 01:15:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 01:15:12 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 06 01:20:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 01:20:47 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 06 01:22:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:22:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 01:22:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 01:22:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 01:22:21 fir-md1-s1 kernel: Lustre: 23751:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565079734/real 1565079734] req@ffff8f30caa0ad00 x1636755682159856/t0(0) o106->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565079741 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 01:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 01:25:28 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 01:28:43 fir-md1-s1 kernel: Lustre: 23728:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565080116/real 1565080116] req@ffff8f343c78d100 x1636755683186640/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565080123 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 01:30:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 01:30:50 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 01:32:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:32:44 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 01:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 01:35:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 01:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1695791400, cur 1565080672 expire 1565080522 last 1565080445 Aug 06 01:38:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 01:38:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 01:41:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 01:41:02 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 06 01:44:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:44:26 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 01:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 01:46:00 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 01:50:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a4997f400, cur 1565081434 expire 1565081284 last 1565081207 Aug 06 01:51:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 01:51:40 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 06 01:54:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 01:54:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 01:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 01:56:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 01:56:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 01:56:25 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 01:58:37 fir-md1-s1 kernel: Lustre: 23723:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565081910/real 1565081910] req@ffff8f3d30fe5d00 x1636755687973600/t0(0) o104->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565081917 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 02:01:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 02:01:53 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 02:04:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:04:56 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 02:06:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 02:06:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 02:08:45 fir-md1-s1 kernel: LustreError: 46812:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.24@o2ib4 arrived at 1565082525 with bad export cookie 6746082935420207264 Aug 06 02:11:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 02:11:56 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 06 02:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34d6cb0800, cur 1565082752 expire 1565082602 last 1565082525 Aug 06 02:12:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 06 02:16:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 02:16:36 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 02:16:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:16:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 02:17:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:17:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 02:21:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:21:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 02:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 02:22:05 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 06 02:26:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 02:26:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 06 02:27:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:27:58 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 02:32:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 02:32:10 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 06 02:33:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:33:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 02:37:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 02:37:18 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 02:38:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:38:07 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 02:41:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:42:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 02:42:10 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 06 02:47:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 02:47:50 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 02:48:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:48:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:48:23 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 02:50:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:50:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 02:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 02:52:13 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 02:56:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 02:56:22 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 02:58:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 02:58:36 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 02:58:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 02:58:44 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 03:02:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 03:02:30 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 06 03:07:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:07:33 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 03:08:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 03:08:47 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 03:09:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 03:09:13 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 03:12:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 03:12:34 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 06 03:19:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:19:42 fir-md1-s1 kernel: LustreError: Skipped 8 previous similar messages Aug 06 03:19:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 03:19:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 03:21:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 03:21:14 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 06 03:22:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 03:22:52 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 06 03:30:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 03:30:02 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 03:31:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 03:31:21 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 06 03:33:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 03:33:09 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Aug 06 03:40:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:40:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 03:41:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 03:41:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 03:41:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 03:41:26 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 06 03:42:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:43:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 03:43:20 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 06 03:45:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:51:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 03:51:22 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 06 03:52:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 03:52:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 03:52:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 03:52:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 03:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 03:53:43 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 06 04:01:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 04:01:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 04:02:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 06 04:02:12 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 06 04:03:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 04:03:53 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 04:04:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 04:04:18 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 04:11:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 04:11:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 04:13:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 04:13:04 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 06 04:14:07 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2bfde64b00 x1631707763349440/t0(0) o101->f8938193-b6f4-691f-a9ed-5d03b37d98de@10.8.30.11@o2ib6:12/0 lens 1784/3288 e 1 to 0 dl 1565090052 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 04:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 37f0749e-8f3a-6d24-b35c-7dfd233bc80b (at 10.8.30.11@o2ib6) Aug 06 04:14:13 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 06 04:14:18 fir-md1-s1 kernel: Lustre: 97668:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23dcad7800 x1631569515757424/t0(0) o101->442dd3b5-503d-fa23-0886-f83a3c7ec479@10.8.18.5@o2ib6:23/0 lens 584/3264 e 1 to 0 dl 1565090063 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 04:14:26 fir-md1-s1 kernel: Lustre: 10148:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f2e042d00 x1634180038030784/t0(0) o101->1a643088-ea7a-3acd-f835-98d006253e47@10.8.20.19@o2ib6:1/0 lens 584/3264 e 1 to 0 dl 1565090071 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 04:14:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 04:14:32 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 06 04:15:22 fir-md1-s1 kernel: LustreError: 21676:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565090032, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2c2ca40480/0x5d9ee6b039bb815d lrc: 3/0,1 mode: --/CW res: [0x200029791:0x7f50:0x0].0x0 bits 0x2/0x0 rrc: 123 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21676 timeout: 0 lvb_type: 0 Aug 06 04:15:33 fir-md1-s1 kernel: LustreError: 97650:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565090043, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f14e5545580/0x5d9ee6b039ce0fa8 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 123 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97650 timeout: 0 lvb_type: 0 Aug 06 04:22:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 04:22:07 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 04:23:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 04:23:10 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 06 04:24:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 04:24:20 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 04:29:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 04:29:38 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 04:32:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 04:32:09 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 04:33:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 04:33:12 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 06 04:34:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 04:34:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 04:40:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 04:40:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 04:43:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 04:43:16 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 04:43:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 04:43:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 04:44:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 04:44:36 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 06 04:47:13 fir-md1-s1 kernel: Lustre: 25675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565092026/real 1565092026] req@ffff8f2ff88de600 x1636755727134464/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565092033 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 04:52:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 04:52:35 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 04:53:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 04:53:21 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 04:54:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 04:54:08 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 04:54:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2eabdfd800, cur 1565092488 expire 1565092338 last 1565092261 Aug 06 04:54:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 04:54:49 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 05:04:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 05:04:20 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 05:04:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 05:04:53 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 06 05:05:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 06 05:05:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 05:06:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 05:06:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 05:15:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 05:15:04 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 06 05:15:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 05:15:11 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 06 05:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 05:15:22 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 05:18:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 05:18:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 05:25:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 05:25:04 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 06 05:25:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 05:25:15 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 05:26:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 05:26:37 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 05:28:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 05:28:34 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 05:35:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 05:35:10 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 06 05:35:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 05:35:30 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 05:37:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 05:37:24 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 05:39:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 05:39:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 05:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 05:45:32 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 06 05:45:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 05:45:36 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 05:47:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 05:47:41 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 05:49:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 05:49:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 05:55:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 05:55:41 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 06 05:56:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 05:56:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 05:57:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 05:57:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 06:01:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 06:01:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 06:06:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 06:06:03 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 06 06:07:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 06:07:49 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 06 06:09:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 06:09:03 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 06:12:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 06:12:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 06:16:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 06:16:47 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 06:18:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 06 06:18:24 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 06 06:19:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 06:19:58 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 06:27:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 06:27:04 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 06:28:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 06:28:45 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 06:30:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 06:30:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 06:30:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 06:30:29 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 06:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 06:37:04 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 06:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 06:38:52 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 06:40:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 06:40:27 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 06:42:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 06:42:18 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 06:47:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 06:47:14 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 06 06:49:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 06:49:19 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 06:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 06:50:44 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 06:57:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 06:57:17 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 06 06:59:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 06:59:43 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 06 07:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 07:00:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 07:02:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:02:52 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 07:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 07:07:21 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 07:10:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 07:10:54 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 07:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 07:10:56 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 07:12:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 07:17:22 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 06 07:18:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:20:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 07:20:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 06 07:21:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 07:21:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 07:27:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:27:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 07:27:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 07:27:54 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 07:31:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 07:31:00 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 07:31:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 07:31:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 06 07:37:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 07:37:57 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 07:40:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:40:28 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 07:41:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 07:41:02 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 06 07:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 07:41:22 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 07:42:57 fir-md1-s1 kernel: Lustre: 10144:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565102570/real 1565102570] req@ffff8f2ff88def00 x1636755770272704/t0(0) o104->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565102577 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 07:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 07:48:16 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 06 07:49:27 fir-md1-s1 kernel: Lustre: 23734:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f27966cb000 x1631348488215184/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:2/0 lens 376/1600 e 0 to 0 dl 1565102972 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 07:50:01 fir-md1-s1 kernel: Lustre: 21414:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:29s); client may timeout. req@ffff8f27966cb000 x1631348488215184/t357047722072(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:2/0 lens 376/944 e 0 to 0 dl 1565102972 ref 1 fl Complete:/0/0 rc 0/0 Aug 06 07:50:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 07:50:57 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 06 07:51:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 07:51:25 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 06 07:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 07:51:45 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 07:58:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 07:58:19 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 08:01:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 08:01:31 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 08:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 08:01:57 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 08:03:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 08:09:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 08:09:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 06 08:11:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 08:11:54 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 08:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 08:12:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 06 08:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 08:19:11 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 08:22:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 08:22:52 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 08:23:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 08:23:26 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 08:23:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 08:23:55 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 06 08:29:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 08:29:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 08:33:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 08:33:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 08:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 08:33:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 08:36:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 08:36:01 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 08:40:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 08:40:15 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 08:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 08:44:10 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 08:46:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 08:46:07 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 06 08:46:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 08:46:54 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 08:50:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 08:50:39 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 06 08:54:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 08:54:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 08:57:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 08:57:16 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 08:58:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 08:58:13 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 09:00:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 09:00:41 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 06 09:05:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 09:05:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 09:10:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 09:10:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 09:10:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 09:10:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 09:10:42 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 09:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 09:15:38 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 09:20:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 09:20:11 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 09:20:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 09:20:43 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 09:25:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 09:25:55 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 09:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ddf690800, cur 1565108966 expire 1565108816 last 1565108739 Aug 06 09:30:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 09:30:15 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 09:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 09:30:44 fir-md1-s1 kernel: Lustre: Skipped 104332 previous similar messages Aug 06 09:31:16 fir-md1-s1 kernel: Lustre: 23744:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565109069/real 1565109069] req@ffff8f30e4b42700 x1636755833137808/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565109076 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 09:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 09:36:09 fir-md1-s1 kernel: Lustre: Skipped 104298 previous similar messages Aug 06 09:37:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 09:37:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 09:40:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 09:40:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 09:40:49 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 06 09:41:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 09:41:11 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 09:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e2ddb6400, cur 1565109685 expire 1565109535 last 1565109458 Aug 06 09:43:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 09:43:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 09:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 09:46:33 fir-md1-s1 kernel: Lustre: Skipped 130309 previous similar messages Aug 06 09:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 786945fd-d0e4-9127-4dce-4fcd2bed9b64 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b1b001800, cur 1565110021 expire 1565109871 last 1565109794 Aug 06 09:51:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 09:51:23 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 06 09:51:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 09:51:23 fir-md1-s1 kernel: Lustre: Skipped 130319 previous similar messages Aug 06 09:55:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 09:56:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 09:56:49 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 06 09:58:03 fir-md1-s1 kernel: Lustre: 35237:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b1a5b450 x1631586364218304/t0(0) o4->75c31e1e-77de-1d06-3ba1-5bf70911b79e@10.9.104.58@o2ib4:8/0 lens 488/448 e 1 to 0 dl 1565110688 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 10:02:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 10:02:32 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 10:02:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 10:02:35 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 10:06:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 10:06:56 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 10:12:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 10:12:35 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 06 10:13:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 10:13:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 10:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 10:17:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 10:18:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 10:18:44 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 10:20:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 10:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 10:22:57 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 06 10:23:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 10:23:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 10:26:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5cfc1f87-c461-6789-22ec-6d26a04c4a40 (at 10.9.109.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2528ac5000, cur 1565112402 expire 1565112252 last 1565112175 Aug 06 10:27:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 10:27:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 10:27:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a2371f50-e170-4029-40e5-eaae6a9f9044 (at 10.9.109.49@o2ib4) in 214 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe977800, cur 1565112478 expire 1565112328 last 1565112264 Aug 06 10:27:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 06 10:29:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 10:33:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 10:33:51 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 06 10:35:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 10:35:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 10:37:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 10:37:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 10:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 10:38:20 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 06 10:43:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 10:43:56 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 10:47:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 06 10:47:25 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 10:48:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 10:48:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 10:49:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 10:49:25 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 06 10:53:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 10:53:59 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 10:57:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 10:57:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 06 10:58:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 10:58:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 11:00:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:00:15 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 11:01:18 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565114471/real 1565114471] req@ffff8f16a3844500 x1636755898409488/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565114478 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 11:01:25 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565114478/real 1565114478] req@ffff8f16a3844500 x1636755898409488/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565114485 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 11:01:54 fir-md1-s1 kernel: Lustre: 97659:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565114507/real 1565114507] req@ffff8f249c7d3c00 x1636755898724576/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565114514 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 11:02:01 fir-md1-s1 kernel: Lustre: 97659:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565114514/real 1565114514] req@ffff8f249c7d3c00 x1636755898724576/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565114521 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 11:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 11:04:00 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 06 11:07:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36217d4800, cur 1565114855 expire 1565114705 last 1565114628 Aug 06 11:07:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 06 11:07:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 11:07:54 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 11:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 11:08:54 fir-md1-s1 kernel: Lustre: Skipped 18918 previous similar messages Aug 06 11:14:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 908b7e20-ca64-f207-eb7c-7b3d028780cb (at 10.8.8.19@o2ib6) Aug 06 11:14:18 fir-md1-s1 kernel: Lustre: Skipped 18953 previous similar messages Aug 06 11:18:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 11:18:04 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 11:19:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 11:19:02 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 11:22:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:22:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 11:23:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 11:24:19 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 06 11:29:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 11:29:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 11:30:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:30:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 11:30:12 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 06 11:32:04 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27158fa100 x1638091478037536/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:9/0 lens 1784/3288 e 1 to 0 dl 1565116329 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 11:32:04 fir-md1-s1 kernel: Lustre: 21668:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f09eedd4e00 x1640802316246064/t0(0) o101->f5de3965-7389-a296-8c42-1779e3e91d02@10.9.103.20@o2ib4:9/0 lens 584/3264 e 1 to 0 dl 1565116329 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 11:32:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1f9fb68900/0x5d9ee6b0babdb7ff lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 144 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226bfaa43 expref: 418 pid: 26256 timeout: 4231398 lvb_type: 0 Aug 06 11:34:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 11:34:22 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 06 11:37:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:37:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 11:39:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 11:39:17 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 06 11:43:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 11:43:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 11:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 11:44:23 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 06 11:49:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 11:49:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 11:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 11:54:30 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 06 11:55:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 11:55:05 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 06 11:55:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 11:55:32 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 11:59:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 11:59:31 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 12:04:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 12:04:41 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 06 12:06:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 12:06:14 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 06 12:06:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 12:06:15 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 06 12:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fda7476c-3001-fa74-1129-6cef5100b933 (at 10.9.102.66@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1619975400, cur 1565118523 expire 1565118373 last 1565118296 Aug 06 12:09:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 12:09:51 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 12:14:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 12:14:45 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 12:16:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 12:16:17 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 06 12:20:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 12:20:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 12:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 12:25:05 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 06 12:26:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 12:26:20 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 12:26:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 12:26:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 12:30:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 12:30:05 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 12:33:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27ab2c1400, cur 1565120005 expire 1565119855 last 1565119778 Aug 06 12:33:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 06 12:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 12:35:14 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 06 12:36:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 12:38:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 12:38:55 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 06 12:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 12:40:17 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 12:40:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25e6065000, cur 1565120428 expire 1565120278 last 1565120201 Aug 06 12:40:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 12:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 12:45:23 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 06 12:49:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 12:49:24 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 12:50:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 12:50:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 12:53:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 12:53:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 12:55:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 12:55:33 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 06 12:59:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 12:59:33 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 06 13:00:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 13:00:38 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 13:04:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:04:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 13:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 13:05:41 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 06 13:09:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 13:09:49 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 06 13:10:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 13:10:48 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 13:15:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 13:15:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 13:17:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:17:31 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 13:19:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 13:19:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 13:21:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 13:21:18 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 06 13:25:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 13:25:51 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 06 13:27:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:27:40 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 13:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 13:31:31 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 13:31:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 13:31:48 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 06 13:36:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 13:36:03 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 06 13:38:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:38:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 13:42:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 13:42:00 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 13:42:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 13:42:20 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 06 13:46:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 13:46:09 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 06 13:48:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:48:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 13:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 13:52:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 13:52:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 13:52:28 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 13:56:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 13:56:10 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 06 13:59:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 13:59:07 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 14:02:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:02:30 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 06 14:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 14:02:34 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 14:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 14:06:22 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 06 14:09:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 14:09:24 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 06 14:10:19 fir-md1-s1 kernel: Lustre: 20461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1eab336000 x1631348495672624/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:24/0 lens 376/1600 e 0 to 0 dl 1565125824 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 14:10:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f25219eee40/0x5d9ee6b0e6e0734d lrc: 3/0,0 mode: PR/PR res: [0x2c002c7ca:0x8d:0x0].0x0 bits 0x19/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a328f34a expref: 1270 pid: 22288 timeout: 4240883 lvb_type: 0 Aug 06 14:10:23 fir-md1-s1 kernel: LustreError: 22288:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f28cfbef400 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e90889b00/0x5d9ee6b0e6e07e83 lrc: 3/0,0 mode: EX/EX res: [0x2c002c7ca:0x8d:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a328f374 expref: 813 pid: 22288 timeout: 0 lvb_type: 3 Aug 06 14:12:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 14:12:52 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 14:13:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:13:49 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 06 14:16:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 14:16:23 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Aug 06 14:21:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 14:21:54 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 14:22:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 14:22:55 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 14:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 14:26:42 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Aug 06 14:28:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:28:02 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 06 14:33:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 14:33:15 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 14:33:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 14:33:40 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 14:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 14:37:09 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 06 14:38:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:38:30 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 06 14:43:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 14:43:26 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 14:45:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 14:45:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 14:47:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 14:47:13 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 06 14:48:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:48:36 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 06 14:53:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 14:53:57 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 14:56:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f334cb7bc00, cur 1565128592 expire 1565128442 last 1565128365 Aug 06 14:57:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 14:57:17 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 06 14:58:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 14:58:37 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 15:03:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 15:03:43 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 15:04:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 15:04:39 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 15:07:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 15:07:29 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 15:08:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 15:08:39 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 15:14:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 15:14:40 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 15:16:55 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16140e4000, cur 1565129815 expire 1565129665 last 1565129588 Aug 06 15:17:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 15:17:35 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 15:18:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 15:18:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 15:18:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 15:18:39 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 15:24:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 15:24:42 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 15:27:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 15:27:50 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 06 15:28:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 15:28:50 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 15:29:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 15:29:42 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 15:35:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 15:35:06 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 15:37:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 15:37:50 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 06 15:39:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 15:39:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 06 15:41:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 15:41:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 15:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 15:45:20 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 15:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 15:47:58 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 06 15:49:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 15:49:31 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 15:53:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 15:53:04 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 15:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 15:55:32 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 15:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 15:59:28 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 06 16:01:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 16:01:14 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 06 16:05:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 16:05:37 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 16:07:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f361ebdd400, cur 1565132866 expire 1565132716 last 1565132639 Aug 06 16:09:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 16:09:17 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 16:09:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 16:09:36 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 06 16:12:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 16:12:37 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 16:14:18 fir-md1-s1 kernel: Lustre: 22284:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565133251/real 1565133251] req@ffff8f1c28880000 x1636756080061136/t0(0) o106->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565133258 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 16:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 16:15:44 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 16:19:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 16:19:38 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 06 16:21:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 16:21:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 16:21:34 fir-md1-s1 kernel: Lustre: 23751:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565133687/real 1565133687] req@ffff8f286da80300 x1636756084306496/t0(0) o106->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565133694 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 16:21:37 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565133690/real 1565133690] req@ffff8f236bcae000 x1636756084339920/t0(0) o106->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565133697 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 16:23:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 16:23:39 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 16:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 16:27:08 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 16:29:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 16:29:45 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 06 16:33:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 16:33:29 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 16:33:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 16:33:40 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 06 16:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 16:37:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 16:40:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 16:40:30 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Aug 06 16:43:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 16:43:57 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 06 16:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 16:47:58 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 16:48:07 fir-md1-s1 kernel: Lustre: 21456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565135280/real 1565135280] req@ffff8f238743b600 x1636756098636832/t0(0) o106->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565135287 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 16:48:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 16:48:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 16:50:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 16:50:31 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 06 16:55:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 16:55:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 16:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 16:58:19 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 17:00:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 17:00:35 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 17:01:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 17:01:08 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 17:08:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 17:08:00 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 17:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 17:08:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 17:10:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 17:10:54 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 06 17:18:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 17:18:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 17:18:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 17:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 17:18:45 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 17:21:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 17:21:05 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 06 17:29:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 17:29:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 17:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 17:29:29 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 17:29:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 17:29:49 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 06 17:30:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34e0d09800, cur 1565137824 expire 1565137674 last 1565137597 Aug 06 17:31:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 17:31:10 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 06 17:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 17:39:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 17:40:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 17:40:27 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 06 17:40:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 17:40:38 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 17:41:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16949fc400, cur 1565138461 expire 1565138311 last 1565138234 Aug 06 17:41:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 17:41:12 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 06 17:46:21 fir-md1-s1 kernel: Lustre: 23753:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565138774/real 1565138774] req@ffff8f288e9d2a00 x1636756134815888/t0(0) o104->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565138781 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 17:50:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 17:50:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 17:50:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 17:50:33 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 06 17:51:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 17:51:15 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 06 17:51:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 17:51:32 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 18:01:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 18:01:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 18:01:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 18:01:25 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 06 18:01:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:01:38 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 06 18:05:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 18:05:23 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 18:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 18:11:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 18:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 18:11:58 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 06 18:12:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:12:46 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 06 18:21:12 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42ebc7f000, cur 1565140872 expire 1565140722 last 1565140645 Aug 06 18:22:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 18:22:04 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 06 18:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 18:22:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 18:23:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:23:30 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 18:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f233510f800, cur 1565141065 expire 1565140915 last 1565140838 Aug 06 18:26:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 18:26:41 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 18:32:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 18:32:15 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 18:32:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 18:32:15 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 06 18:33:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 18:35:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:35:32 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 06 18:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 18:36:16 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 18:41:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 18:41:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 18:42:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 18:42:20 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 06 18:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 18:43:05 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 18:46:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:46:08 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 06 18:53:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 18:53:00 fir-md1-s1 kernel: Lustre: Skipped 101083 previous similar messages Aug 06 18:53:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 18:53:14 fir-md1-s1 kernel: Lustre: Skipped 101065 previous similar messages Aug 06 18:54:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b11960000, cur 1565142881 expire 1565142731 last 1565142654 Aug 06 18:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd7f17ef-720b-3bee-dff4-afb00097809a (at 10.8.26.4@o2ib6) in 166 seconds. I think it's dead, and I am evicting it. exp ffff8f2651cce000, cur 1565142957 expire 1565142807 last 1565142791 Aug 06 18:55:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 06 18:56:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 18:56:15 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 18:56:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7692bafc-4f6e-695c-696e-545f787fb0f2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f11a0f19c00, cur 1565143018 expire 1565142868 last 1565142791 Aug 06 18:56:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 06 19:03:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 19:03:09 fir-md1-s1 kernel: Lustre: Skipped 103137 previous similar messages Aug 06 19:04:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 19:04:09 fir-md1-s1 kernel: Lustre: Skipped 103115 previous similar messages Aug 06 19:06:34 fir-md1-s1 kernel: LustreError: 22007:0:(mdt_lvb.c:430:mdt_lvbo_fill()) fir-MDT0000: small buffer size 632 for EA 656 (max_mdsize 1256): rc = -34 Aug 06 19:06:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 19:06:46 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 06 19:07:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:07:59 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 06 19:09:03 fir-md1-s1 kernel: Lustre: 23631:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565143736/real 1565143736] req@ffff8f4008dcda00 x1636756200363856/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565143743 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:09:10 fir-md1-s1 kernel: Lustre: 23631:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565143743/real 1565143743] req@ffff8f4008dcda00 x1636756200363856/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565143750 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 19:09:30 fir-md1-s1 kernel: Lustre: 50583:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565143763/real 1565143763] req@ffff8f2bd9132d00 x1636756200711984/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565143770 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:09:37 fir-md1-s1 kernel: Lustre: 50583:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565143770/real 1565143770] req@ffff8f2bd9132d00 x1636756200711984/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565143777 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 19:11:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:11:50 fir-md1-s1 kernel: Lustre: 97655:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565143903/real 1565143903] req@ffff8f2da6127200 x1636756202326256/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565143910 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:13:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 19:13:21 fir-md1-s1 kernel: Lustre: Skipped 51421 previous similar messages Aug 06 19:14:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 19:14:19 fir-md1-s1 kernel: Lustre: Skipped 169593 previous similar messages Aug 06 19:15:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:16:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 19:16:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 19:23:07 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565144580/real 1565144580] req@ffff8f2eaebe5700 x1636756211520784/t0(0) o104->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565144587 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:23:07 fir-md1-s1 kernel: Lustre: 23704:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 06 19:23:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 19:23:23 fir-md1-s1 kernel: Lustre: Skipped 267393 previous similar messages Aug 06 19:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 19:24:25 fir-md1-s1 kernel: Lustre: Skipped 149179 previous similar messages Aug 06 19:26:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 19:26:57 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 19:31:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:33:25 fir-md1-s1 kernel: Lustre: 23681:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f299cb91b00 x1632261275366400/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:0/0 lens 376/1600 e 0 to 0 dl 1565145210 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 19:33:29 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f236a724140/0x5d9ee6b140bc3c37 lrc: 3/0,0 mode: CR/CR res: [0x2c002c7e9:0x8f:0x0].0x0 bits 0x9/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c69d97 expref: 569 pid: 50447 timeout: 4260269 lvb_type: 0 Aug 06 19:33:29 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f3320900c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f319d5b9440/0x5d9ee6b140bc3f32 lrc: 3/0,0 mode: EX/EX res: [0x2c002c7e9:0x8f:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c69dc8 expref: 281 pid: 23743 timeout: 0 lvb_type: 3 Aug 06 19:33:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 19:33:49 fir-md1-s1 kernel: Lustre: Skipped 6156 previous similar messages Aug 06 19:33:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:34:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 19:34:26 fir-md1-s1 kernel: Lustre: Skipped 6134 previous similar messages Aug 06 19:37:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 19:37:08 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 06 19:38:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1437faa400, cur 1565145516 expire 1565145366 last 1565145289 Aug 06 19:43:48 fir-md1-s1 kernel: Lustre: 97646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565145821/real 1565145821] req@ffff8f17784fd100 x1636756263326608/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565145828 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:43:55 fir-md1-s1 kernel: Lustre: 97646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565145828/real 1565145828] req@ffff8f17784fd100 x1636756263326608/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565145835 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 19:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 19:43:57 fir-md1-s1 kernel: Lustre: Skipped 18830 previous similar messages Aug 06 19:44:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 19:44:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 19:44:45 fir-md1-s1 kernel: Lustre: Skipped 18798 previous similar messages Aug 06 19:46:24 fir-md1-s1 kernel: Lustre: 20731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565145977/real 1565145977] req@ffff8f20fa78b300 x1636756265268112/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565145984 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:46:31 fir-md1-s1 kernel: Lustre: 20731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565145984/real 1565145984] req@ffff8f20fa78b300 x1636756265268112/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565145991 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 06 19:47:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 06 19:47:28 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 06 19:48:50 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565146123/real 1565146123] req@ffff8f28d0f08c00 x1636756266575264/t0(0) o104->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565146130 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:50:07 fir-md1-s1 kernel: Lustre: 97646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565146200/real 1565146200] req@ffff8f2304774e00 x1636756267526384/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565146207 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 19:50:07 fir-md1-s1 kernel: Lustre: 97646:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 06 19:52:49 fir-md1-s1 kernel: LustreError: 25083:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.17@o2ib6 arrived at 1565146369 with bad export cookie 6746082878345727207 Aug 06 19:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 19:54:06 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 06 19:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 19:54:47 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 06 19:57:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 19:57:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 20:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 20:04:16 fir-md1-s1 kernel: Lustre: Skipped 11721 previous similar messages Aug 06 20:04:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 20:04:54 fir-md1-s1 kernel: Lustre: Skipped 11697 previous similar messages Aug 06 20:06:07 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1897d2f200 x1631348504034096/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:12/0 lens 376/1600 e 1 to 0 dl 1565147172 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 20:06:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f12c34f18c0/0x5d9ee6b1483da2de lrc: 3/0,0 mode: PR/PR res: [0x2c002c7e6:0xa9:0x0].0x0 bits 0x19/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32b5991 expref: 1773 pid: 97646 timeout: 4262241 lvb_type: 0 Aug 06 20:06:21 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1598e4e400 ns: mdt-fir-MDT0002_UUID lock: ffff8f3f0cf67740/0x5d9ee6b1483da642 lrc: 3/0,0 mode: EX/EX res: [0x2c002c7e6:0xa9:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x50000000000000 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32b59bb expref: 801 pid: 97646 timeout: 0 lvb_type: 3 Aug 06 20:06:21 fir-md1-s1 kernel: Lustre: 97646:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f1897d2f200 x1631348504034096/t357138259769(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:12/0 lens 376/1568 e 1 to 0 dl 1565147172 ref 1 fl Complete:/0/0 rc -107/-107 Aug 06 20:07:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:08:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 20:08:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 06 20:14:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 20:14:18 fir-md1-s1 kernel: Lustre: Skipped 36815 previous similar messages Aug 06 20:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 20:15:45 fir-md1-s1 kernel: Lustre: Skipped 36802 previous similar messages Aug 06 20:17:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:18:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 06 20:18:20 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 06 20:21:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:22:37 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565148150/real 1565148150] req@ffff8f17fa8a9500 x1636756290445248/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565148157 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:23:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:23:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 20:24:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 20:24:22 fir-md1-s1 kernel: Lustre: Skipped 38690 previous similar messages Aug 06 20:25:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 20:25:46 fir-md1-s1 kernel: Lustre: Skipped 38671 previous similar messages Aug 06 20:29:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 20:29:10 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 06 20:30:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:30:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 20:31:13 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565148666/real 1565148666] req@ffff8f206558bf00 x1636756321724800/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565148673 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:33:50 fir-md1-s1 kernel: Lustre: 97665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565148823/real 1565148823] req@ffff8f18f90a2100 x1636756323372128/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565148830 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:34:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 20:34:29 fir-md1-s1 kernel: Lustre: Skipped 1381 previous similar messages Aug 06 20:35:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 20:35:52 fir-md1-s1 kernel: Lustre: Skipped 1366 previous similar messages Aug 06 20:36:05 fir-md1-s1 kernel: Lustre: 21456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565148958/real 1565148958] req@ffff8f2518cca700 x1636756324551840/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565148965 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 06 20:38:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:38:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 20:40:10 fir-md1-s1 kernel: Lustre: 22279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565149203/real 1565149203] req@ffff8f1739d67800 x1636756330834368/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565149210 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:40:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 20:40:54 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 20:44:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 20:44:41 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 06 20:45:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 20:45:57 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 20:47:46 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565149659/real 1565149659] req@ffff8f1e19f45100 x1636756335781200/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565149666 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:47:46 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 06 20:51:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 20:51:13 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 06 20:52:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 20:52:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 20:54:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 20:54:46 fir-md1-s1 kernel: Lustre: Skipped 37665 previous similar messages Aug 06 20:55:41 fir-md1-s1 kernel: Lustre: 21676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565150134/real 1565150134] req@ffff8f27f46e8f00 x1636756340833952/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565150141 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:55:41 fir-md1-s1 kernel: Lustre: 21676:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 06 20:55:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 20:55:59 fir-md1-s1 kernel: Lustre: Skipped 37657 previous similar messages Aug 06 20:58:39 fir-md1-s1 kernel: Lustre: 23576:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565150312/real 1565150312] req@ffff8f11715b7500 x1636756342941744/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565150319 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 20:58:39 fir-md1-s1 kernel: Lustre: 23576:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 06 20:59:17 fir-md1-s1 kernel: Lustre: 97663:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d7a59f200 x1631348504558080/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:22/0 lens 376/1600 e 0 to 0 dl 1565150362 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:00:09 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f235eb1d400 x1635092855343136/t0(0) o101->ce99df68-e3a7-efcb-b2cd-56ac5966a69c@10.9.105.4@o2ib4:14/0 lens 576/3264 e 1 to 0 dl 1565150414 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:00:18 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1897dc9b00 x1631870322990912/t0(0) o101->d9780840-1f56-7c7c-79f4-885f3cda00f2@10.8.30.4@o2ib6:23/0 lens 576/3264 e 0 to 0 dl 1565150423 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:00:18 fir-md1-s1 kernel: Lustre: 20722:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Aug 06 21:00:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1dd8ee5580/0x5d9ee6b15e80df97 lrc: 3/0,0 mode: PR/PR res: [0x2c002c742:0x1809f:0x0].0x0 bits 0x13/0x0 rrc: 308 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226c11dac expref: 287 pid: 20460 timeout: 4265482 lvb_type: 0 Aug 06 21:01:24 fir-md1-s1 kernel: Lustre: 23710:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2e9be17200 x1631348504579120/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:29/0 lens 376/1600 e 0 to 0 dl 1565150489 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:01:24 fir-md1-s1 kernel: Lustre: 23710:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 83 previous similar messages Aug 06 21:01:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f16de48f980/0x5d9ee6b15f34c133 lrc: 3/0,0 mode: PR/PR res: [0x2c002c7f1:0x35:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32bc458 expref: 513 pid: 23645 timeout: 4265548 lvb_type: 0 Aug 06 21:01:28 fir-md1-s1 kernel: LustreError: 23734:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2ec4377400 ns: mdt-fir-MDT0002_UUID lock: ffff8f3229308480/0x5d9ee6b15f34c8d4 lrc: 3/0,0 mode: EX/EX res: [0x2c002c7f1:0x35:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x50000000000000 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32bc45f expref: 448 pid: 23734 timeout: 0 lvb_type: 3 Aug 06 21:02:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:02:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 06 21:03:31 fir-md1-s1 kernel: Lustre: 97642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-6), not sending early reply req@ffff8f1a50698300 x1631353775366080/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:6/0 lens 376/1600 e 0 to 0 dl 1565150616 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:04:35 fir-md1-s1 kernel: LustreError: 97668:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565150585, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1c4edfba80/0x5d9ee6b160d9b5ed lrc: 3/0,1 mode: --/EX res: [0x2c002c7f5:0xd:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97668 timeout: 0 lvb_type: 0 Aug 06 21:05:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 21:05:07 fir-md1-s1 kernel: Lustre: Skipped 12037 previous similar messages Aug 06 21:05:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 21:06:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 21:06:15 fir-md1-s1 kernel: Lustre: Skipped 12012 previous similar messages Aug 06 21:08:11 fir-md1-s1 kernel: Lustre: 23619:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2b637fd100 x1631348504644288/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:16/0 lens 376/1600 e 0 to 0 dl 1565150896 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:08:30 fir-md1-s1 kernel: Lustre: 23642:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565150903/real 1565150903] req@ffff8f2fffdaef00 x1636756349527440/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565150910 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 06 21:13:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f91b69400, cur 1565151200 expire 1565151050 last 1565150973 Aug 06 21:13:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:13:35 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 21:15:00 fir-md1-s1 kernel: Lustre: 23678:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a5e30d100 x1632261276382304/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:5/0 lens 376/1600 e 1 to 0 dl 1565151305 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:15:14 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f4305aa1f80/0x5d9ee6b16829dc9f lrc: 3/0,0 mode: PR/PR res: [0x2c002c7ee:0xb5:0x0].0x0 bits 0x19/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c77999 expref: 642 pid: 50583 timeout: 4266374 lvb_type: 0 Aug 06 21:15:14 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f399fb52c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2fcb92bcc0/0x5d9ee6b16829e0f8 lrc: 1/0,0 mode: EX/EX res: [0x2c002c7ee:0xb5:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x54801000000000 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c779c3 expref: 231 pid: 23645 timeout: 0 lvb_type: 3 Aug 06 21:15:14 fir-md1-s1 kernel: Lustre: 23645:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f2a5e30d100 x1632261276382304/t357144844856(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:5/0 lens 376/1568 e 1 to 0 dl 1565151305 ref 1 fl Complete:/0/0 rc -107/-107 Aug 06 21:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 21:15:22 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 06 21:16:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 21:16:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 06 21:20:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 21:20:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 21:23:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:23:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 21:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 21:25:22 fir-md1-s1 kernel: Lustre: Skipped 1699 previous similar messages Aug 06 21:25:48 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d6929b000 x1631564094745936/t0(0) o101->d594a152-d993-c755-50bf-0f3b806ddc60@10.9.107.22@o2ib4:23/0 lens 584/3264 e 1 to 0 dl 1565151953 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:25:59 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f23cca42400 x1631614041910672/t0(0) o101->c55a1e29-63e2-1cd7-0e7e-0d12f0df0a6c@10.8.30.1@o2ib6:3/0 lens 576/3264 e 0 to 0 dl 1565151963 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:25:59 fir-md1-s1 kernel: Lustre: 21456:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 06 21:26:14 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-6), not sending early reply req@ffff8f2dc5561500 x1631777463314688/t0(0) o101->dc01060d-ea41-80d7-2b03-1a4b061c0f7e@10.8.13.1@o2ib6:18/0 lens 576/3264 e 0 to 0 dl 1565151978 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:26:14 fir-md1-s1 kernel: Lustre: 23642:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 06 21:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 196b1913-b944-5fc1-c179-e301c8a174ad (at 10.9.102.7@o2ib4) reconnecting Aug 06 21:26:33 fir-md1-s1 kernel: Lustre: Skipped 1689 previous similar messages Aug 06 21:26:35 fir-md1-s1 kernel: Lustre: 21128:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f391725d100 x1631545028702976/t0(0) o101->5fdf1bc5-187c-a555-f9d1-818d91c0bfa4@10.9.105.9@o2ib4:10/0 lens 584/3264 e 1 to 0 dl 1565152000 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:26:35 fir-md1-s1 kernel: Lustre: 21128:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 06 21:27:03 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565151933, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f321229b3c0/0x5d9ee6b16b28adc7 lrc: 3/0,1 mode: --/CW res: [0x200029791:0x7f50:0x0].0x0 bits 0x2/0x0 rrc: 149 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23743 timeout: 0 lvb_type: 0 Aug 06 21:27:03 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 06 21:27:07 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565151936, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2e95bebcc0/0x5d9ee6b16b2d4963 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 150 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21414 timeout: 0 lvb_type: 0 Aug 06 21:27:07 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 06 21:27:07 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f85aa7500 x1635358013438816/t0(0) o101->e90e26e9-54e6-7601-c634-05b1cc133462@10.8.18.18@o2ib6:12/0 lens 584/3264 e 0 to 0 dl 1565152032 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:27:07 fir-md1-s1 kernel: Lustre: 23728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Aug 06 21:27:18 fir-md1-s1 kernel: LustreError: 22284:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565151948, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f27dae5f500/0x5d9ee6b16b4561f9 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 151 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22284 timeout: 0 lvb_type: 0 Aug 06 21:27:26 fir-md1-s1 kernel: Lustre: 97650:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f1944967200 x1631585843072768/t0(0) o101->3d7e8f12-7be2-ea29-b7bf-4852602a4361@10.9.106.56@o2ib4:25/0 lens 584/536 e 0 to 0 dl 1565152045 ref 1 fl Complete:/0/0 rc 0/0 Aug 06 21:27:26 fir-md1-s1 kernel: Lustre: 24586:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f2eb45ef500 x1631777526797360/t0(0) o101->cb1e051f-12ef-c393-c1de-bc60ba01debc@10.8.13.11@o2ib6:25/0 lens 584/536 e 0 to 0 dl 1565152045 ref 1 fl Complete:/0/0 rc 0/0 Aug 06 21:27:26 fir-md1-s1 kernel: Lustre: 97650:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 06 21:29:00 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f266cfe2d00 x1632261276498528/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:5/0 lens 376/1600 e 0 to 0 dl 1565152145 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:29:00 fir-md1-s1 kernel: Lustre: 21003:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Aug 06 21:30:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 21:30:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 06 21:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2913953800, cur 1565152273 expire 1565152123 last 1565152046 Aug 06 21:31:21 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f171f269800 x1631679170837536/t0(0) o101->0d6b5cb8-9f2b-7df7-1a69-4be7494f679a@10.8.11.32@o2ib6:26/0 lens 1792/3288 e 0 to 0 dl 1565152286 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:31:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2cba8e8480/0x5d9ee6b16e610d2b lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 142 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226c151b5 expref: 100 pid: 23659 timeout: 4267345 lvb_type: 0 Aug 06 21:33:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:33:46 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 21:35:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 21:35:32 fir-md1-s1 kernel: Lustre: Skipped 140 previous similar messages Aug 06 21:36:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 21:36:41 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 06 21:44:07 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f28714f6000 x1631348504952208/t0(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:12/0 lens 376/1600 e 1 to 0 dl 1565153052 ref 2 fl Interpret:/0/0 rc 0/0 Aug 06 21:44:07 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 06 21:44:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f30d63e18c0/0x5d9ee6b178afb955 lrc: 3/0,0 mode: PR/PR res: [0x2c002c7f6:0xe:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32bf607 expref: 296 pid: 21003 timeout: 4268121 lvb_type: 0 Aug 06 21:44:21 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f3321ce1c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f26d80d5e80/0x5d9ee6b178afc502 lrc: 1/0,0 mode: EX/EX res: [0x2c002c7f6:0xe:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.22.20@o2ib6 remote: 0xe96edf08a32bf60e expref: 15 pid: 23743 timeout: 0 lvb_type: 3 Aug 06 21:44:21 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f28714f6000 x1631348504952208/t357147700845(0) o101->eb03d68c-4477-fd95-4120-c15d0364314e@10.8.22.20@o2ib6:12/0 lens 376/1568 e 1 to 0 dl 1565153052 ref 1 fl Complete:/0/0 rc -107/-107 Aug 06 21:45:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:45:02 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 06 21:46:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 21:46:15 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 06 21:47:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 21:47:01 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 21:49:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 21:51:26 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f307aae7500/0x5d9ee6b17ff5f2dd lrc: 3/0,0 mode: PR/PR res: [0x2c002c7f7:0x2c:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c7b8df expref: 376 pid: 23743 timeout: 4268546 lvb_type: 0 Aug 06 21:51:26 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2f9c90ac00 ns: mdt-fir-MDT0002_UUID lock: ffff8f28739e8900/0x5d9ee6b17ff5fe7c lrc: 1/0,0 mode: EX/EX res: [0x2c002c7f7:0x2c:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c7b8e6 expref: 4 pid: 23743 timeout: 0 lvb_type: 3 Aug 06 21:55:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 21:55:23 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 06 21:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 21:56:21 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 06 21:57:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 21:57:02 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 21:59:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 21:59:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 22:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 22:06:22 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 22:06:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 06 22:06:50 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 06 22:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 22:07:34 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 22:10:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 22:10:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 22:16:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 22:16:22 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 06 22:16:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 22:16:52 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 06 22:17:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 22:17:43 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 22:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 22:26:34 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 06 22:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 22:28:09 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 06 22:28:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 22:28:34 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 06 22:32:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 22:32:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 06 22:36:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 22:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 22:36:37 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 06 22:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 22:38:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 22:38:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 22:38:39 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 22:43:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 22:46:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 06 22:46:54 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 06 22:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 06 22:48:21 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 06 22:49:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e5f185000, cur 1565156958 expire 1565156808 last 1565156731 Aug 06 22:49:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 06 22:49:36 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 06 22:53:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 22:57:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 06 22:57:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 06 22:58:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 22:58:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 06 22:59:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 22:59:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 06 23:04:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 23:04:24 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 06 23:07:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 23:07:15 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 06 23:08:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 06 23:08:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 06 23:09:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 23:09:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 06 23:17:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 23:17:17 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Aug 06 23:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 06 23:18:43 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 06 23:19:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 23:19:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 23:19:49 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 06 23:27:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 23:27:32 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 06 23:28:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 23:28:52 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 23:32:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 23:32:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 06 23:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 06 23:37:54 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 06 23:39:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 23:39:13 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 06 23:42:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 06 23:42:56 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 06 23:45:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 23:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 06 23:49:06 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 06 23:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 06 23:49:33 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 06 23:51:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 23:51:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 06 23:52:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 06 23:52:57 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 06 23:54:30 fir-md1-s1 kernel: LustreError: 27583:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f19659f5050 x1632261277743632/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:5/0 lens 488/448 e 0 to 0 dl 1565160875 ref 1 fl Interpret:/0/0 rc 0/0 Aug 06 23:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Aug 06 23:55:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 06 23:55:01 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 06 23:59:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 06 23:59:27 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 06 23:59:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 06 23:59:56 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 00:02:58 fir-md1-s1 kernel: LustreError: 20502:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1d06734050 x1632261277811136/t0(0) o4->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:27/0 lens 488/448 e 0 to 0 dl 1565161407 ref 1 fl Interpret:/0/0 rc 0/0 Aug 07 00:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6), client will retry: rc = -110 Aug 07 00:03:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 00:03:02 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 07 00:06:39 fir-md1-s1 kernel: Lustre: 23603:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565161592/real 1565161592] req@ffff8f2e13fa0000 x1636756468570592/t0(0) o104->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565161599 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 00:06:39 fir-md1-s1 kernel: Lustre: 23603:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 07 00:07:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 00:07:25 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 00:09:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 00:09:34 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 07 00:10:03 fir-md1-s1 kernel: Lustre: 97645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f226f54d400 x1632261277890240/t0(0) o101->64093eed-1899-7457-95e6-ff7526581ffb@10.8.10.21@o2ib6:8/0 lens 376/1600 e 0 to 0 dl 1565161808 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 00:10:03 fir-md1-s1 kernel: Lustre: 97645:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 07 00:10:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 00:10:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 00:10:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2c9b780fc0/0x5d9ee6b23d065ac1 lrc: 3/0,0 mode: PR/PR res: [0x2c002c7fc:0x19:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c828f4 expref: 982 pid: 23704 timeout: 4276867 lvb_type: 0 Aug 07 00:10:07 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2e675bbc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f0cb6325c40/0x5d9ee6b23d066812 lrc: 3/0,0 mode: EX/EX res: [0x2c002c7fc:0x19:0x0].0x0 bits 0x8/0x0 rrc: 4 type: IBT flags: 0x50000000000000 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85c828fb expref: 676 pid: 24580 timeout: 0 lvb_type: 3 Aug 07 00:13:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 00:13:19 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 07 00:19:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 00:19:54 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 07 00:20:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 00:20:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 00:23:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 00:23:45 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 00:25:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 00:25:25 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Aug 07 00:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 00:30:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 07 00:30:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 00:30:10 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 00:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 00:34:03 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 07 00:35:59 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 07 00:36:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 00:36:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 00:39:57 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 07 00:40:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 00:40:07 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 07 00:40:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 00:40:33 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 07 00:45:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 00:45:16 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 00:49:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 00:49:51 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 00:50:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 00:50:09 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 07 00:50:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 00:50:56 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 00:55:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 00:55:37 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 07 01:00:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 01:00:11 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 01:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 01:01:05 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 01:04:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 01:04:09 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 01:05:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:05:38 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 07 01:10:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 01:10:29 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 07 01:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 01:11:09 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 01:15:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:15:40 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 01:17:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 01:17:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 01:20:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 01:20:33 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 07 01:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 01:21:22 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 01:25:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:25:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 01:28:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 01:28:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 01:30:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 01:30:50 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 01:31:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 01:31:42 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 01:36:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:36:36 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 07 01:40:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 01:40:51 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 07 01:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 01:42:07 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 07 01:46:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:46:37 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 07 01:50:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1505446800, cur 1565167813 expire 1565167663 last 1565167586 Aug 07 01:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 01:51:10 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 07 01:52:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 01:52:23 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 01:53:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 01:53:34 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 01:55:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 01:56:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 01:56:38 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 07 02:01:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:01:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 02:01:12 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 07 02:02:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 02:02:36 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 02:06:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:06:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 02:06:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 02:06:40 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 07 02:11:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 02:11:22 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Aug 07 02:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 02:13:46 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 02:16:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 02:16:41 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 02:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:18:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 02:21:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 02:21:42 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 07 02:23:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 02:23:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 02:27:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 02:27:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 02:30:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:30:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 02:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 02:31:49 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 07 02:35:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 02:35:10 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 02:37:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 02:37:31 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 02:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17ad0a9400, cur 1565170779 expire 1565170629 last 1565170552 Aug 07 02:40:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:40:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 02:41:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 02:41:49 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 07 02:45:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 02:45:36 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 07 02:48:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 02:48:07 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 07 02:50:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 02:50:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 02:52:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 02:52:24 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 07 02:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 02:56:07 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 03:01:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 03:01:08 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 03:02:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 03:02:27 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 03:05:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 03:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 03:06:13 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 03:11:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 03:11:08 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 07 03:13:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 03:13:11 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 07 03:16:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 03:16:38 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 03:17:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 03:17:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 03:21:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 03:21:15 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 07 03:23:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 03:23:14 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 07 03:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 03:26:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 03:28:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a03dfc000, cur 1565173703 expire 1565173553 last 1565173476 Aug 07 03:31:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 03:31:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 03:31:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 03:31:32 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 03:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 03:33:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 07 03:36:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 03:36:44 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 03:41:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 03:41:45 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 03:43:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 07 03:43:00 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 03:43:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 03:43:49 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 07 03:46:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 03:46:54 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 03:53:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 03:53:05 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 03:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 03:54:06 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 07 03:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 03:56:58 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 03:58:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 03:58:59 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 07 04:03:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 04:03:40 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 04:04:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 04:04:07 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 04:07:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 04:07:07 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 07 04:14:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 04:14:22 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 07 04:14:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 04:14:22 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 07 04:17:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 04:17:29 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 04:24:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 04:24:26 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 04:24:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 04:24:39 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 04:25:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 04:25:39 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 04:27:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 04:27:42 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 07 04:29:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 04:29:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 04:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 04:34:53 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 04:36:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 04:36:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 07 04:37:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 04:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 04:37:47 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 04:43:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31faf69800, cur 1565178195 expire 1565178045 last 1565177968 Aug 07 04:44:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 04:44:59 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 07 04:46:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 04:46:30 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 07 04:47:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 04:47:56 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 04:49:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 04:54:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 04:54:59 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 07 04:57:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 07 04:57:04 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 07 04:58:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 04:58:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 05:02:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:02:03 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 05:05:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 05:05:11 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 07 05:07:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 05:07:14 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 07 05:08:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 05:08:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 05:12:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:12:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 05:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 05:15:45 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 05:17:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 05:17:52 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 07 05:18:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f249a6f5c00, cur 1565180288 expire 1565180138 last 1565180061 Aug 07 05:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 05:18:45 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 05:25:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 05:25:50 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 07 05:28:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 05:28:49 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 05:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 05:28:51 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 05:35:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 05:35:55 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 07 05:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:36:16 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 05:39:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 05:39:05 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 05:39:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 05:39:44 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 05:42:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:42:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 05:45:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 05:45:57 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 07 05:49:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 05:49:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 05:49:41 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 07 05:49:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 05:49:56 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 05:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 05:56:00 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 07 05:59:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f351a0f2c00, cur 1565182766 expire 1565182616 last 1565182539 Aug 07 06:00:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 06:00:03 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 06:00:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 06:00:19 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 06:03:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 06:03:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 06:06:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 06:06:10 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 07 06:09:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252119e800, cur 1565183343 expire 1565183193 last 1565183116 Aug 07 06:10:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 06:10:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 06:10:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 06:10:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 06:16:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 06:16:14 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 07 06:20:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de5295c00, cur 1565184000 expire 1565183850 last 1565183773 Aug 07 06:20:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 06:20:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 06:21:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 06:21:14 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 07 06:26:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 06:26:50 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Aug 07 06:29:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 06:29:27 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 07 06:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 06:30:38 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 06:33:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 06:33:43 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 06:36:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 06:36:52 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 07 06:37:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 06:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 06:40:44 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 06:41:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 06:44:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 06:44:06 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 07 06:47:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 06:47:09 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 07 06:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 06:50:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 06:52:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 06:52:47 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 06:54:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 06:54:15 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 06:57:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 06:57:17 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 07 07:01:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 07:01:08 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 07:04:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 07:04:18 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 07 07:04:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 07:04:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 07:07:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 07:07:35 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 07 07:11:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 07:11:10 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 07:12:29 fir-md1-s1 kernel: Lustre: 26888:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565187142/real 1565187142] req@ffff8f296f000000 x1636756675483120/t0(0) o105->fir-MDT0002@10.8.10.21@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1565187149 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 07:14:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 07:14:52 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 07:15:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 07:15:15 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 07:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 07:17:39 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 07 07:21:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33e3859800, cur 1565187662 expire 1565187512 last 1565187435 Aug 07 07:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 07:21:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 07:25:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 07:25:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 07:27:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 07:27:43 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 07 07:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 07 07:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d97799b0-2495-2be7-ae58-c130c06c5e4f (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f276028a800, cur 1565188465 expire 1565188315 last 1565188238 Aug 07 07:35:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 07:35:02 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 07 07:37:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 07:37:57 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Aug 07 07:40:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 07:40:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 07:41:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 07:41:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 07:45:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 07:45:07 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 07 07:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 07:48:22 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 07 07:51:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 07:51:38 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 07 07:55:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 07:55:16 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 07 07:57:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 07:57:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 07:58:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 07:59:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 07:59:06 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 07 08:01:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 08:02:12 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 08:02:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:06:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 07 08:06:14 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 08:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 08:09:30 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 07 08:11:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:12:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 08:12:23 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 07 08:17:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 08:17:26 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 07 08:18:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:19:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 08:19:37 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 07 08:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 08:23:31 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 08:27:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:27:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 08:27:57 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 08:29:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 08:29:40 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 07 08:33:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 08:33:33 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 08:38:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 08:38:03 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 07 08:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 08:39:42 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Aug 07 08:43:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:43:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 08:43:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 08:43:39 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 08:48:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 08:48:04 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 08:49:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 08:49:43 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 07 08:53:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 08:53:14 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 08:53:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 08:53:41 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 08:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ae9081b9-15b7-d037-713b-67343872796f (at 10.9.104.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1580510400, cur 1565193499 expire 1565193349 last 1565193272 Aug 07 08:58:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 07 08:59:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 08:59:03 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 07 08:59:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 08:59:49 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Aug 07 09:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 09:03:52 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 09:09:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 09:09:09 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 09:09:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 09:09:50 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 07 09:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 09:12:19 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 09:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 09:14:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 09:19:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 09:19:22 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 09:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 09:19:56 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 07 09:24:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 09:24:00 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 09:27:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 09:27:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 09:29:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 09:29:35 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 09:29:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 09:29:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 09:29:59 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 07 09:34:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 09:34:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 09:39:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 09:39:19 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 09:39:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 09:39:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 09:40:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 09:40:01 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 07 09:44:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 09:44:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 09:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 09:50:03 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 07 09:50:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 09:50:31 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 09:54:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 09:54:28 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 10:00:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 10:00:05 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 07 10:01:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 10:01:16 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 10:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 10:05:13 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 10:10:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 10:10:22 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 10:11:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 10:11:47 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 10:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 10:15:24 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 10:17:56 fir-md1-s1 kernel: Lustre: 27318:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2cff475a00 x1640754170791888/t0(0) o101->30afcc6f-4b54-6899-62fc-e555017235fa@10.8.14.1@o2ib6:1/0 lens 480/568 e 1 to 0 dl 1565198281 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:18:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.23.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1c586d45c0/0x5d9ee6b4ecf17186 lrc: 3/0,0 mode: PW/PW res: [0x2c002c78a:0x2d:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.23.6@o2ib6 remote: 0xa0f82f92b9d28004 expref: 19 pid: 97661 timeout: 4313350 lvb_type: 0 Aug 07 10:18:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 10:18:11 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 10:18:31 fir-md1-s1 kernel: Lustre: 20461:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565198304/real 1565198304] req@ffff8f1e3f1d0900 x1636756760762160/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565198311 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 10:18:38 fir-md1-s1 kernel: Lustre: 20461:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565198311/real 1565198311] req@ffff8f1e3f1d0900 x1636756760762160/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565198318 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 07 10:18:44 fir-md1-s1 kernel: Lustre: 20545:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565198317/real 1565198317] req@ffff8f1e3f1d6c00 x1636756760798608/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565198324 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 10:18:49 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1feaf89200 x1635699542792224/t0(0) o101->62873e5a-5401-394e-2139-5fd47462d1df@10.8.29.2@o2ib6:24/0 lens 480/568 e 0 to 0 dl 1565198334 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:18:56 fir-md1-s1 kernel: Lustre: 97647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565198328/real 1565198328] req@ffff8f237b731200 x1636756760844192/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565198335 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 07 10:18:56 fir-md1-s1 kernel: Lustre: 20461:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565198328/real 1565198328] req@ffff8f1e3f1d0900 x1636756760762160/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565198335 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 07 10:18:56 fir-md1-s1 kernel: Lustre: 20461:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 07 10:18:56 fir-md1-s1 kernel: LustreError: 20461:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) failed to reply to blocking AST (req@ffff8f1e3f1d0900 x1636756760762160 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f244faec800/0x5d9ee6b4ecaefac4 lrc: 4/0,0 mode: PR/PR res: [0x2000297f6:0x5192:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8dd0f expref: 1832667 pid: 24579 timeout: 4313418 lvb_type: 0 Aug 07 10:18:56 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 07 10:18:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 31s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f244faec800/0x5d9ee6b4ecaefac4 lrc: 3/0,0 mode: PR/PR res: [0x2000297f6:0x5192:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8dd0f expref: 1832668 pid: 24579 timeout: 0 lvb_type: 0 Aug 07 10:18:56 fir-md1-s1 kernel: Lustre: 97647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 07 10:19:06 fir-md1-s1 kernel: LustreError: 23733:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ff89c8600 x1636756761038400/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:19:10 fir-md1-s1 kernel: LustreError: 23704:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ba678a700 x1636756761068544/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:19:31 fir-md1-s1 kernel: Lustre: 23760:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f29ebd75d00 x1636469552520192/t0(0) o101->9eed212b-34d9-6e26-f1ac-cdc452decf97@10.8.29.3@o2ib6:6/0 lens 480/568 e 0 to 0 dl 1565198376 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:19:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f197bfed7c0/0x5d9ee6b4eb59d8bc lrc: 3/0,0 mode: PR/PR res: [0x200029790:0x1d61:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdb491d0e expref: 1455016 pid: 23454 timeout: 4313435 lvb_type: 0 Aug 07 10:19:39 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ec7fec140/0x5d9ee6b4ecb4cbbe lrc: 3/0,0 mode: PR/PR res: [0x200029f20:0x141:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8e1ca expref: 1436309 pid: 23741 timeout: 4313439 lvb_type: 0 Aug 07 10:20:13 fir-md1-s1 kernel: LustreError: 23747:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f311eb3c200 x1636756761652464/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:20:19 fir-md1-s1 kernel: LustreError: 21457:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e3f1d7b00 x1636756761697360/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:20:26 fir-md1-s1 kernel: LustreError: 20461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198335, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1a0221aac0/0x5d9ee6b4ed2e9586 lrc: 3/0,1 mode: --/PW res: [0x2000297f6:0x5192:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20461 timeout: 0 lvb_type: 0 Aug 07 10:20:26 fir-md1-s1 kernel: LustreError: 20461:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 07 10:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2d524c01-86ec-17e0-96b8-b2219eeea319 (at 10.8.29.2@o2ib6) Aug 07 10:20:28 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 07 10:20:35 fir-md1-s1 kernel: LustreError: 97647:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f19e60ecb00 x1636756761843344/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:20:36 fir-md1-s1 kernel: LustreError: 23733:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198346, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3244d42880/0x5d9ee6b4ed7a9a58 lrc: 3/0,1 mode: --/PW res: [0x200029790:0x1d61:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23733 timeout: 0 lvb_type: 0 Aug 07 10:20:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1fd21dd100/0x5d9ee6b4ecb1d1e4 lrc: 3/0,0 mode: PR/PR res: [0x2000297f6:0x518f:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8df46 expref: 1208952 pid: 24579 timeout: 4313503 lvb_type: 0 Aug 07 10:20:44 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f237e52b300 x1636518672970816/t0(0) o101->3429bec6-fe2a-19ec-4f0c-bb576fed4ff4@10.8.29.4@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1565198449 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:20:44 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 07 10:20:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 10:20:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 10:21:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f324ad4cec0/0x5d9ee6b4ec9daf8a lrc: 3/0,0 mode: PR/PR res: [0x200029790:0x1d4e:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8547a expref: 1150524 pid: 23714 timeout: 4313524 lvb_type: 0 Aug 07 10:21:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 07 10:21:43 fir-md1-s1 kernel: LustreError: 23747:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198413, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f259ad0e300/0x5d9ee6b4edc982ec lrc: 3/0,1 mode: --/PW res: [0x2000297f6:0x518f:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23747 timeout: 0 lvb_type: 0 Aug 07 10:21:45 fir-md1-s1 kernel: LNet: Service thread pid 20461 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 07 10:21:45 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 07 10:21:45 fir-md1-s1 kernel: Pid: 20461, comm: mdt01_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 07 10:21:45 fir-md1-s1 kernel: Call Trace: Aug 07 10:21:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 07 10:21:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 07 10:21:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 07 10:21:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 07 10:21:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 07 10:21:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565198505.20461 Aug 07 10:21:49 fir-md1-s1 kernel: LustreError: 21457:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198419, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2115800d80/0x5d9ee6b4edcfa490 lrc: 3/0,1 mode: --/PW res: [0x200029dbc:0xa32:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21457 timeout: 0 lvb_type: 0 Aug 07 10:21:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 10:21:54 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 10:22:05 fir-md1-s1 kernel: LustreError: 97647:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198435, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16c6a55c40/0x5d9ee6b4ede386cb lrc: 3/0,1 mode: --/PW res: [0x200029790:0x1d4e:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97647 timeout: 0 lvb_type: 0 Aug 07 10:22:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eee31ffd-3f22-28a6-5ce3-2d79d96aab8a (at 10.8.23.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e22647800, cur 1565198532 expire 1565198382 last 1565198305 Aug 07 10:22:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 07 10:22:24 fir-md1-s1 kernel: LustreError: 20719:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f237b735a00 x1636756762581328/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:22:27 fir-md1-s1 kernel: LNet: Service thread pid 23733 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 07 10:22:27 fir-md1-s1 kernel: Pid: 23733, comm: mdt02_087 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 07 10:22:27 fir-md1-s1 kernel: Call Trace: Aug 07 10:22:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 07 10:22:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 07 10:22:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 07 10:22:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 07 10:22:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 07 10:22:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565198547.23733 Aug 07 10:22:49 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c6e408300 x1631626214270016/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:24/0 lens 504/568 e 0 to 0 dl 1565198574 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:22:49 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 07 10:22:53 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1aacf9bcc0/0x5d9ee6b4ecac4259 lrc: 3/0,0 mode: PR/PR res: [0x200029d38:0x69e1:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8d54b expref: 905662 pid: 20545 timeout: 4313633 lvb_type: 0 Aug 07 10:23:34 fir-md1-s1 kernel: LNet: Service thread pid 23747 was inactive for 200.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 07 10:23:34 fir-md1-s1 kernel: Pid: 23747, comm: mdt02_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 07 10:23:34 fir-md1-s1 kernel: Call Trace: Aug 07 10:23:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 07 10:23:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 07 10:23:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 07 10:23:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 07 10:23:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 07 10:23:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565198614.23747 Aug 07 10:23:39 fir-md1-s1 kernel: LNet: Service thread pid 21457 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 07 10:23:39 fir-md1-s1 kernel: Pid: 21457, comm: mdt01_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 07 10:23:39 fir-md1-s1 kernel: Call Trace: Aug 07 10:23:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 07 10:23:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 07 10:23:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 07 10:23:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 07 10:23:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 07 10:23:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565198619.21457 Aug 07 10:23:54 fir-md1-s1 kernel: LustreError: 20719:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565198544, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f220c002880/0x5d9ee6b4ee7ece7b lrc: 3/0,1 mode: --/PW res: [0x200029d38:0x69e1:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20719 timeout: 0 lvb_type: 0 Aug 07 10:25:07 fir-md1-s1 kernel: LNet: Service thread pid 23733 completed after 360.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 07 10:25:08 fir-md1-s1 kernel: LNet: Service thread pid 20461 completed after 403.67s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 07 10:25:08 fir-md1-s1 kernel: LustreError: 23695:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ba678bf00 x1636756764044160/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:25:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 10:25:08 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 10:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3429bec6-fe2a-19ec-4f0c-bb576fed4ff4 (at 10.8.29.4@o2ib6) reconnecting Aug 07 10:25:29 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 07 10:25:43 fir-md1-s1 kernel: LustreError: 97640:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f19e60eb300 x1636756764403296/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:26:08 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f160269e900 x1631626214430496/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:13/0 lens 480/568 e 0 to 0 dl 1565198773 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:26:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f34b608d7c0/0x5d9ee6b4ecac64e2 lrc: 3/0,0 mode: PR/PR res: [0x200029d38:0x69df:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8d598 expref: 575711 pid: 23741 timeout: 4313833 lvb_type: 0 Aug 07 10:30:20 fir-md1-s1 kernel: LNet: Service thread pid 21457 completed after 601.69s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 07 10:30:20 fir-md1-s1 kernel: LustreError: 21457:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1e3f1d0c00 x1636756766556896/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:30:20 fir-md1-s1 kernel: LustreError: 21457:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Aug 07 10:30:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 10:30:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 10:30:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 10:30:42 fir-md1-s1 kernel: Lustre: Skipped 159 previous similar messages Aug 07 10:30:45 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f230bda5100 x1636518673071696/t0(0) o101->3429bec6-fe2a-19ec-4f0c-bb576fed4ff4@10.8.29.4@o2ib6:20/0 lens 480/568 e 0 to 0 dl 1565199050 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 10:30:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2ee55f1d40/0x5d9ee6b4ecad894c lrc: 3/0,0 mode: PR/PR res: [0x200029dbc:0xa30:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdba8d704 expref: 228667 pid: 23741 timeout: 4314110 lvb_type: 0 Aug 07 10:31:48 fir-md1-s1 kernel: LNet: Service thread pid 23747 completed after 694.74s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 07 10:31:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 10:31:55 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 07 10:32:49 fir-md1-s1 kernel: LustreError: 21379:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ff89cbc00 x1636756767449184/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 07 10:32:49 fir-md1-s1 kernel: LustreError: 21379:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Aug 07 10:33:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f19143e2f40/0x5d9ee6b4ecca8403 lrc: 3/0,0 mode: PR/PR res: [0x200029e31:0x40:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x3a2b829cdbafcfe1 expref: 70954 pid: 20719 timeout: 4314258 lvb_type: 0 Aug 07 10:33:18 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 07 10:35:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 10:35:50 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 07 10:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 10:40:43 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 07 10:41:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 10:41:56 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 10:46:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 10:46:04 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 10:46:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 10:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 10:50:52 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 07 10:54:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 10:54:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 07 10:56:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 10:56:15 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 07 11:00:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 11:00:56 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 07 11:04:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 11:04:37 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 11:06:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 11:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 11:06:38 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 11:11:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 11:11:04 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 11:15:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 11:15:03 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 11:16:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 11:16:43 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 11:21:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 11:21:09 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 11:26:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 11:26:27 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 11:26:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 11:26:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 11:29:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cfe35abf-8553-eb7a-a41e-f23d64904648 (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2598fff400, cur 1565202588 expire 1565202438 last 1565202361 Aug 07 11:31:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 11:31:22 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 07 11:31:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 11:31:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 11:36:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 11:36:29 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 11:37:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 11:37:05 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 11:41:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 11:41:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 11:41:49 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 07 11:46:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 11:46:35 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 11:47:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 11:47:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 11:51:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 11:51:49 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 07 11:53:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 11:53:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 11:56:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 11:56:53 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 07 11:57:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 11:57:41 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 12:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 12:02:18 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 07 12:07:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 12:07:18 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 12:07:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 12:07:47 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 12:12:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 12:12:18 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 12:17:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 12:17:59 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 12:18:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 12:18:29 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 12:22:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 12:22:29 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 07 12:23:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:23:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 12:26:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:28:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 12:28:31 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 07 12:28:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 12:28:32 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 12:32:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 12:32:40 fir-md1-s1 kernel: Lustre: Skipped 120 previous similar messages Aug 07 12:33:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 12:38:40 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 12:39:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 12:39:52 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 07 12:42:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 12:42:46 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Aug 07 12:48:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:48:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 12:48:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 12:48:41 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 07 12:49:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:49:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:50:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 12:50:23 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 07 12:53:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 12:53:09 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Aug 07 12:58:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 12:58:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 12:58:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 12:58:53 fir-md1-s1 kernel: Lustre: Skipped 161310 previous similar messages Aug 07 13:01:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 13:01:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 07 13:03:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 13:03:12 fir-md1-s1 kernel: Lustre: Skipped 161353 previous similar messages Aug 07 13:09:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 13:09:23 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Aug 07 13:11:45 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 07 13:11:45 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Aug 07 13:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 13:13:22 fir-md1-s1 kernel: Lustre: Skipped 175 previous similar messages Aug 07 13:13:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 13:13:52 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 13:17:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6052c41b-9004-bcc3-dbad-bff4bc2f2f04 (at 10.8.14.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f7769fc00, cur 1565209030 expire 1565208880 last 1565208803 Aug 07 13:17:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 07 13:17:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6052c41b-9004-bcc3-dbad-bff4bc2f2f04 (at 10.8.14.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2507bb4400, cur 1565209046 expire 1565208896 last 1565208819 Aug 07 13:17:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 07 13:17:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:19:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:19:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 13:19:25 fir-md1-s1 kernel: Lustre: Skipped 105 previous similar messages Aug 07 13:19:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:23:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 13:23:32 fir-md1-s1 kernel: Lustre: Skipped 96 previous similar messages Aug 07 13:23:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 13:23:59 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 13:28:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:29:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 13:29:35 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 07 13:33:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 13:33:48 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 13:34:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:34:45 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 13:35:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 13:35:23 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 13:39:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 13:39:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 13:43:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 13:43:58 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 07 13:45:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 13:45:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 13:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 13:49:46 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 07 13:51:09 fir-md1-s1 kernel: Lustre: 23633:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565211061/real 1565211061] req@ffff8f10f526c800 x1636756910116320/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565211068 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 13:51:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:53:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:53:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 13:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 13:53:59 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 07 13:54:57 fir-md1-s1 kernel: Lustre: 23691:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565211290/real 1565211290] req@ffff8f0afea3b300 x1636756912438064/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565211297 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 13:56:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 13:56:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 13:56:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 13:56:21 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 13:59:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 13:59:51 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 07 14:04:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 14:04:10 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 14:07:32 fir-md1-s1 kernel: Lustre: 23077:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565212045/real 1565212045] req@ffff8f293d7a2a00 x1636756921222336/t0(0) o104->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565212052 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 14:07:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 14:07:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 07 14:10:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 14:10:19 fir-md1-s1 kernel: Lustre: Skipped 10410 previous similar messages Aug 07 14:14:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 14:14:11 fir-md1-s1 kernel: Lustre: Skipped 10439 previous similar messages Aug 07 14:17:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:17:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 14:18:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 07 14:18:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 14:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 14:20:34 fir-md1-s1 kernel: Lustre: Skipped 15060 previous similar messages Aug 07 14:24:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 14:24:44 fir-md1-s1 kernel: Lustre: Skipped 15072 previous similar messages Aug 07 14:30:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:30:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 14:30:36 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 07 14:30:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 14:30:40 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 07 14:32:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:34:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 14:34:50 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 14:35:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:40:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 14:40:41 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 14:41:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 14:41:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 14:44:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 14:44:52 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 07 14:50:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 14:50:54 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 14:51:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 14:51:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 14:52:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:54:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 14:54:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 14:54:54 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 07 15:01:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 15:01:32 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 15:01:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 15:01:41 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 07 15:04:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 15:04:55 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 07 15:07:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:10:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:11:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 15:11:49 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 15:12:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:13:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 15:13:09 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 07 15:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 15:15:23 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 15:19:06 fir-md1-s1 kernel: Lustre: 23077:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565216339/real 1565216339] req@ffff8f2f1ec05100 x1636756974455792/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565216346 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 07 15:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 15:21:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 15:23:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 15:23:51 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 15:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 15:25:23 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 07 15:29:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:32:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:32:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 15:32:04 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 07 15:33:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 15:33:57 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 15:34:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:35:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:35:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 15:35:42 fir-md1-s1 kernel: Lustre: Skipped 107 previous similar messages Aug 07 15:37:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 15:42:13 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 15:43:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 07 15:43:58 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 07 15:45:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 15:45:56 fir-md1-s1 kernel: Lustre: Skipped 33768 previous similar messages Aug 07 15:48:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:51:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 15:52:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 15:52:32 fir-md1-s1 kernel: Lustre: Skipped 33755 previous similar messages Aug 07 15:55:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 15:55:01 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 15:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 15:56:07 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 07 15:57:34 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 07 15:58:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 16:02:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 16:02:56 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 16:06:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 16:06:13 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 07 16:08:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 16:08:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 07 16:11:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 16:11:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 16:12:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 16:12:56 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 07 16:16:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 16:16:22 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 16:18:12 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 07 16:18:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 16:18:21 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 07 16:22:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 16:22:58 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 16:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 16:26:33 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 07 16:26:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 16:26:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 16:29:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 16:29:45 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 07 16:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 16:33:03 fir-md1-s1 kernel: Lustre: Skipped 65881 previous similar messages Aug 07 16:36:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 16:36:38 fir-md1-s1 kernel: Lustre: Skipped 65904 previous similar messages Aug 07 16:39:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 16:39:23 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 16:41:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 16:41:22 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 07 16:43:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 16:43:10 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 16:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 16:46:56 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 16:49:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 16:49:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 16:52:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 16:52:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 07 16:53:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 16:53:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 16:57:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 16:57:09 fir-md1-s1 kernel: Lustre: Skipped 77748 previous similar messages Aug 07 17:03:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 17:03:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 07 17:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 17:03:52 fir-md1-s1 kernel: Lustre: Skipped 77743 previous similar messages Aug 07 17:07:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 17:07:13 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 07 17:09:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 17:09:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 17:13:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 17:13:07 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 07 17:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 17:13:54 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 07 17:17:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 17:17:25 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 07 17:19:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 17:19:42 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 17:23:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 17:23:50 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 07 17:24:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 17:24:01 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 07 17:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 17:27:38 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 07 17:34:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 17:34:12 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 17:34:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 17:34:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 07 17:37:38 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d06736c50 x1631353793095424/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:13/0 lens 488/448 e 1 to 0 dl 1565224663 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 17:37:38 fir-md1-s1 kernel: Lustre: 21540:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 07 17:37:43 fir-md1-s1 kernel: LustreError: 46552:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+0s req@ffff8f1d06736c50 x1631353793095424/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:13/0 lens 488/448 e 1 to 0 dl 1565224663 ref 1 fl Interpret:/0/0 rc 0/0 Aug 07 17:37:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 07 17:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 17:37:47 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 07 17:39:14 fir-md1-s1 kernel: Lustre: 23760:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:35s); client may timeout. req@ffff8f3009fac500 x1631353793109824/t440455088231(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:9/0 lens 376/1568 e 1 to 0 dl 1565224719 ref 1 fl Complete:/0/0 rc 0/0 Aug 07 17:40:38 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f25313ef050 x1631353793126288/t0(0) o4->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:26/0 lens 488/448 e 1 to 0 dl 1565224856 ref 1 fl Interpret:/0/0 rc 0/0 Aug 07 17:40:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6), client will retry: rc = -110 Aug 07 17:41:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 17:41:06 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 17:42:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 17:42:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 17:44:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 17:44:14 fir-md1-s1 kernel: Lustre: Skipped 213 previous similar messages Aug 07 17:44:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 17:44:46 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 17:48:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 17:48:35 fir-md1-s1 kernel: Lustre: Skipped 252 previous similar messages Aug 07 17:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 17:54:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 17:54:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 17:54:55 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 17:58:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 17:58:38 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 07 18:00:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:04:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 18:04:20 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 18:05:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 18:05:35 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 18:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 18:08:44 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Aug 07 18:08:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:09:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:11:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:14:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 18:14:29 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 07 18:15:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:15:56 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 18:16:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 18:16:19 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 07 18:18:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 18:18:44 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 07 18:24:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 18:24:30 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 18:25:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:25:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 18:26:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 18:26:24 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 07 18:28:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 18:28:45 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 07 18:34:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 18:34:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 18:36:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 18:36:43 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 18:38:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 18:38:45 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 07 18:43:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:43:47 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 18:44:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 18:44:51 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 18:48:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 18:48:59 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 07 18:51:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 18:51:00 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 18:54:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 18:54:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 18:54:59 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 07 18:59:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 18:59:19 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 07 19:01:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 19:01:14 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 19:05:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 19:05:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 19:09:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 19:09:59 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 07 19:11:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 19:11:32 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 07 19:14:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 19:14:41 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 19:15:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 19:15:08 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 19:20:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 19:20:09 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 07 19:21:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 19:21:55 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 19:25:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 19:25:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 19:26:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 19:26:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 19:30:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 19:30:11 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 07 19:31:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 19:31:56 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 07 19:35:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 19:35:31 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 19:36:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9970e629-c4a2-c189-538c-35266faf501a (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b9117b400, cur 1565231815 expire 1565231665 last 1565231588 Aug 07 19:36:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 07 19:37:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 19:37:07 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 19:40:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 19:40:30 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 07 19:44:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 19:44:54 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 19:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 19:45:37 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 07 19:50:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 19:50:45 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 19:55:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 19:55:44 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 19:55:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 19:55:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 07 19:58:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 19:58:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 20:00:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 20:00:49 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 07 20:01:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:01:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 20:05:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 20:05:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 20:06:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 20:06:14 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 07 20:06:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:06:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 20:10:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 20:10:56 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 20:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 20:15:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 20:17:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:17:46 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 07 20:18:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 20:18:12 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 07 20:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 20:21:35 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 07 20:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 20:25:57 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 20:28:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 20:28:22 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 20:29:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:29:17 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 20:31:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 20:31:39 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 07 20:36:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 20:36:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 20:39:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:39:35 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 20:39:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 20:39:45 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 07 20:41:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 20:41:48 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 07 20:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 20:46:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 07 20:49:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 20:49:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 07 20:50:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 20:50:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 20:52:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 20:52:00 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 20:57:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 20:57:06 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 07 21:00:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 21:00:13 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 21:00:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 21:00:34 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 21:02:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 21:02:16 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 07 21:07:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26722a8400, cur 1565237236 expire 1565237086 last 1565237009 Aug 07 21:07:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 07 21:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 21:07:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 21:10:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 21:10:23 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 21:11:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 21:11:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 21:12:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 21:12:25 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 07 21:17:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 21:17:33 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 21:20:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 21:20:52 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 07 21:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 07 21:22:29 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 07 21:25:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 21:25:09 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 07 21:28:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 21:28:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 21:29:27 fir-md1-s1 kernel: Lustre: 14792:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f09e42b4c50 x1633926861390912/t0(0) o3->a7e9d272-b0d3-4359-c385-5d7a30e45350@10.9.101.54@o2ib4:2/0 lens 488/440 e 1 to 0 dl 1565238572 ref 2 fl Interpret:/0/0 rc 0/0 Aug 07 21:29:27 fir-md1-s1 kernel: Lustre: 14792:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 07 21:30:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 21:30:53 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 07 21:32:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 21:32:42 fir-md1-s1 kernel: Lustre: Skipped 139 previous similar messages Aug 07 21:38:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 21:38:01 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 21:42:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 21:42:30 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 07 21:42:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 21:42:46 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 07 21:42:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 21:42:48 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 07 21:48:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 21:48:05 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 07 21:53:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 07 21:53:00 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 07 21:53:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 21:53:05 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 07 21:56:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 21:56:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 21:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 21:58:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 07 22:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 22:03:11 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 22:03:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 22:03:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 22:06:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 22:06:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 22:08:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 22:08:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 07 22:13:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 22:13:20 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 07 22:13:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 22:13:30 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 07 22:19:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 22:19:27 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 22:21:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 22:21:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 22:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 22:23:30 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 07 22:23:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 22:23:56 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 22:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 22:29:42 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 07 22:31:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 22:31:54 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 22:33:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 22:33:34 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 07 22:35:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 22:35:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 07 22:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 22:39:44 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 07 22:43:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 07 22:43:46 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 07 22:46:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 07 22:46:57 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 07 22:47:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 22:47:55 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 22:49:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 22:49:55 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 22:54:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 22:54:03 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 07 22:58:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 22:58:50 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 07 23:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 23:00:02 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 23:02:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 23:02:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 23:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 23:04:03 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 23:09:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 07 23:09:10 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 07 23:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 07 23:10:07 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 23:14:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 23:14:04 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 07 23:14:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 23:20:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 23:20:01 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 07 23:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 07 23:20:22 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 07 23:24:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 23:24:24 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 07 23:30:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 23:30:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 07 23:30:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 07 23:30:41 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 07 23:31:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 07 23:31:02 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 07 23:34:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 23:34:30 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 07 23:40:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 07 23:40:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 07 23:41:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 23:41:27 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 07 23:44:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 07 23:44:43 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 07 23:48:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 07 23:48:41 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 07 23:50:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 07 23:50:58 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 07 23:51:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 07 23:51:32 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 07 23:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 07 23:54:45 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 08 00:01:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 00:01:07 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 00:01:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 00:01:47 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 00:02:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 08 00:02:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 00:04:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 00:04:52 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 08 00:11:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 00:11:22 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 00:12:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 00:12:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 00:14:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 00:14:11 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 08 00:14:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 00:14:57 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 08 00:21:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 00:21:24 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 00:25:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 00:25:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 00:25:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 00:25:12 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 00:29:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 00:31:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 00:31:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 00:35:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 00:35:37 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 00:35:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 00:35:37 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 08 00:41:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 00:41:10 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 00:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 00:42:08 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 00:45:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 00:45:41 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 00:48:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 00:48:26 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 00:48:33 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 08 00:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 00:52:13 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 00:55:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 00:55:49 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 08 00:58:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 00:58:53 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 08 01:00:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 01:00:41 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 01:02:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 01:02:50 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 01:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 01:05:58 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 08 01:09:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 01:09:39 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 01:10:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 01:10:42 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 01:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 01:13:03 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 01:16:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 01:16:08 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 01:19:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 01:19:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 08 01:20:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 01:20:46 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 01:23:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 01:23:22 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 01:26:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 01:26:08 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 08 01:29:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 01:29:54 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 08 01:33:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 01:33:34 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 01:35:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 01:35:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 01:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 01:36:09 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 08 01:39:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 01:39:59 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 01:43:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 01:43:34 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 01:46:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 01:46:20 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 08 01:50:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 01:50:13 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 01:52:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 01:52:36 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 01:53:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 01:53:40 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 08 01:56:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 01:56:51 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Aug 08 02:00:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 02:00:40 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 08 02:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 02:03:52 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 02:05:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 02:05:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 02:06:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 02:06:52 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 02:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 02:14:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 02:15:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 02:15:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 08 02:17:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 02:17:18 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 08 02:22:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 02:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 02:24:19 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 02:25:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 02:25:31 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 08 02:27:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 02:27:31 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 08 02:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 02:34:25 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 02:35:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 02:35:32 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 02:35:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 02:37:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 02:37:38 fir-md1-s1 kernel: Lustre: Skipped 85 previous similar messages Aug 08 02:44:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 02:44:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 02:47:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 02:47:01 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 08 02:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 02:47:40 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 02:53:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 02:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 02:54:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 02:57:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 02:57:05 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 02:57:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 02:57:51 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 08 03:04:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 03:04:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 03:04:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 03:04:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 03:07:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 03:07:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 03:08:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 03:08:02 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 08 03:14:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 03:14:55 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 03:17:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 03:17:25 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 08 03:18:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 03:18:03 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 08 03:18:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 03:18:57 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 03:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 03:25:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 03:27:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 03:27:49 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 08 03:28:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 03:28:17 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 08 03:33:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 03:35:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 03:35:16 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 03:38:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 03:38:01 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 08 03:38:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 03:38:26 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 08 03:45:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 03:45:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 03:46:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 03:46:42 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 03:48:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 03:48:05 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 08 03:48:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 03:48:30 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages Aug 08 03:55:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 03:55:45 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 03:58:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 03:58:06 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 08 03:58:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27dae6d800, cur 1565261887 expire 1565261737 last 1565261660 Aug 08 03:58:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 03:58:56 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages Aug 08 03:59:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 03:59:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 04:06:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 04:06:09 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 04:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 04:08:59 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 04:09:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 04:09:00 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 04:10:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 04:10:45 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 04:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 04:16:15 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 04:19:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31aebd5000, cur 1565263149 expire 1565262999 last 1565262922 Aug 08 04:19:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 04:19:14 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 08 04:20:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 04:20:00 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 04:25:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 04:25:06 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 04:26:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 04:26:27 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 04:29:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 04:29:39 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 04:30:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 04:30:05 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 04:37:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 04:37:04 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 04:37:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 04:37:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 04:39:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 04:39:43 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 04:40:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 04:40:30 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 08 04:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 04:47:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 04:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 04:49:46 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 08 04:50:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f11c4f32000, cur 1565265044 expire 1565264894 last 1565264817 Aug 08 04:51:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 04:51:23 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 04:57:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 04:57:38 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 04:59:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 04:59:50 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 08 05:01:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 05:01:44 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 08 05:03:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:03:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 05:07:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 05:07:41 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 05:09:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 05:09:57 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 08 05:13:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 05:13:24 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 05:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 05:17:48 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 05:19:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 05:19:58 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 08 05:23:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 05:23:28 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 08 05:26:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:26:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:27:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:27:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 05:27:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 05:28:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:30:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 05:30:10 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 05:35:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 05:35:28 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 08 05:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 05:38:05 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 05:39:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:39:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 05:40:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 05:40:15 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 08 05:47:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 05:47:11 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 05:48:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 05:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 05:48:06 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 05:50:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 05:50:39 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 05:57:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 05:57:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 05:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 05:58:28 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 08 06:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 06:00:46 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 08 06:04:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 06:04:21 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 06:08:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 06:08:08 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 08 06:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 06:08:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 06:09:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 06:10:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 06:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 06:11:25 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 06:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 06:18:14 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 06:18:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 06:18:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 08 06:18:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 06:18:58 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 06:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 06:21:41 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 08 06:28:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 06:28:53 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 06:29:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 06:29:00 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 06:31:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 06:31:44 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 08 06:37:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 06:37:23 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 06:38:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 06:38:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 08 06:39:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 06:39:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 06:42:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 06:42:04 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 06:49:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 06:49:02 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 08 06:49:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 06:49:34 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 06:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 06:52:05 fir-md1-s1 kernel: Lustre: Skipped 128 previous similar messages Aug 08 06:59:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 06:59:44 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 06:59:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 06:59:57 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 07:02:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 07:02:10 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 08 07:07:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 07:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 07:10:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 07:10:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 07:10:16 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 08 07:12:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 07:12:34 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 08 07:18:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 07:20:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 07:20:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 07:20:29 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 07:20:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 07:20:58 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 07:22:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 07:22:51 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 08 07:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 07:30:34 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 07:31:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 07:31:01 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 07:32:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 07:32:54 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 08 07:40:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 07:40:38 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 07:42:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 07:42:55 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 08 07:42:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 07:42:55 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 08 07:50:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 07:50:42 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 07:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 07:53:12 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 08 07:54:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f317a986400, cur 1565276061 expire 1565275911 last 1565275834 Aug 08 07:57:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 07:57:08 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 07:57:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 08:00:04 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 08 08:01:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 08:01:00 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 08:03:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 08:03:13 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 08 08:05:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 08:07:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 08:07:32 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 08:11:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 08:11:13 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 08 08:13:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 08:13:30 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 08 08:18:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 08:18:09 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 08:21:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 08:21:22 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 08 08:23:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 08:23:45 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 08 08:25:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 08:28:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 08:28:10 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 08:32:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 08:32:24 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 08:33:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 08:33:46 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 08 08:38:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 08:38:17 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 08:42:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 08:42:52 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 08:43:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 08:43:52 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 08:48:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 08:48:45 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 08 08:50:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 08:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 08:52:52 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 08:53:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 08:53:53 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 08 08:59:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 08:59:00 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 09:03:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 09:03:01 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 09:03:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 09:03:54 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 08 09:08:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:09:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 09:09:07 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 08 09:13:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 09:13:01 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 08 09:14:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 09:14:00 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Aug 08 09:19:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 09:19:58 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 09:20:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:22:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:23:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 09:23:11 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 09:24:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 09:24:00 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 08 09:28:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:28:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:32:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 09:32:31 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 08 09:33:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 09:33:30 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 09:34:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 09:34:20 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 09:43:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 09:43:33 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 09:43:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 09:43:48 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 09:44:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 09:44:26 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 09:49:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 09:54:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 09:54:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 09:54:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 09:54:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 09:54:58 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 09:54:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 08 10:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 10:04:12 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 08 10:05:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 10:05:07 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 10:06:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:06:47 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 10:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 10:14:33 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 10:15:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 10:15:13 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 08 10:16:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:16:47 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 10:22:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34d9f42400, cur 1565284967 expire 1565284817 last 1565284740 Aug 08 10:24:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 10:24:41 fir-md1-s1 kernel: Lustre: Skipped 7190 previous similar messages Aug 08 10:25:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 10:25:19 fir-md1-s1 kernel: Lustre: Skipped 7235 previous similar messages Aug 08 10:28:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:28:38 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 10:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 10:35:20 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 10:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 10:35:20 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 08 10:38:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:38:42 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 08 10:42:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:42:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 10:46:01 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 10:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 10:46:01 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 08 10:48:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:48:48 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 10:49:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:56:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 10:56:03 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 08 10:56:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 10:56:22 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 10:58:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:59:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:59:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 10:59:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 10:59:38 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 11:00:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 11:06:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 11:06:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:06:52 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 08 11:06:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 08 11:10:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 11:11:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 11:11:48 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 08 11:11:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 11:11:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 11:17:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 11:17:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:17:09 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 11:17:09 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 11:22:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 11:24:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 11:24:01 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 11:27:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 11:27:19 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 11:27:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:27:19 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 08 11:29:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 11:34:26 fir-md1-s1 kernel: Lustre: 10585:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565289259/real 1565289259] req@ffff8f32dd7bef00 x1636757751040752/t0(0) o104->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565289266 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 11:34:26 fir-md1-s1 kernel: Lustre: 10585:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 08 11:34:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 08 11:34:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 11:37:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:37:23 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 08 11:37:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 11:37:36 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 11:44:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 11:44:36 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 08 11:44:41 fir-md1-s1 kernel: Lustre: 23742:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565289874/real 1565289874] req@ffff8f2ae0b60300 x1636757781893344/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565289881 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 11:47:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:47:36 fir-md1-s1 kernel: Lustre: Skipped 109 previous similar messages Aug 08 11:47:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 11:47:46 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 11:54:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 08 11:54:43 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 11:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 11:57:46 fir-md1-s1 kernel: Lustre: Skipped 30135 previous similar messages Aug 08 11:57:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 11:57:46 fir-md1-s1 kernel: Lustre: Skipped 30163 previous similar messages Aug 08 12:03:40 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 12:04:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 12:04:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 12:05:23 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 12:06:31 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 12:07:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 12:07:57 fir-md1-s1 kernel: Lustre: Skipped 17547 previous similar messages Aug 08 12:07:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 12:07:57 fir-md1-s1 kernel: Lustre: Skipped 17584 previous similar messages Aug 08 12:08:37 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f194c84ad00 x1631353802930512/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:11/0 lens 376/1600 e 1 to 0 dl 1565291321 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 12:09:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:09:39 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 12:13:16 fir-md1-s1 kernel: Lustre: 22007:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565291589/real 1565291589] req@ffff8f1aa7f83c00 x1636757819193904/t0(0) o106->fir-MDT0000@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565291596 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 12:15:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 12:15:58 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 12:18:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 12:18:00 fir-md1-s1 kernel: Lustre: Skipped 7694 previous similar messages Aug 08 12:18:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 12:18:38 fir-md1-s1 kernel: Lustre: Skipped 7667 previous similar messages Aug 08 12:22:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:22:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 12:23:23 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 12:26:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:26:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 12:26:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 08 12:28:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 12:28:02 fir-md1-s1 kernel: Lustre: Skipped 69751 previous similar messages Aug 08 12:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 12:28:57 fir-md1-s1 kernel: Lustre: Skipped 69742 previous similar messages Aug 08 12:32:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:35:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 12:38:08 fir-md1-s1 kernel: Lustre: Skipped 5704 previous similar messages Aug 08 12:38:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 12:38:40 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 08 12:39:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 12:39:03 fir-md1-s1 kernel: Lustre: Skipped 5694 previous similar messages Aug 08 12:47:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:48:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 12:48:19 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 12:48:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:49:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 12:49:10 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 12:49:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:50:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 12:50:14 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 12:50:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:57:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 12:58:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 12:58:26 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 08 12:59:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 12:59:27 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 08 13:00:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 13:00:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 08 13:06:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 13:08:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 13:08:35 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 08 13:09:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 13:09:35 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 13:10:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 13:10:38 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 08 13:18:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 13:18:36 fir-md1-s1 kernel: Lustre: Skipped 340 previous similar messages Aug 08 13:20:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 13:20:01 fir-md1-s1 kernel: Lustre: Skipped 312 previous similar messages Aug 08 13:22:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 13:22:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 13:26:32 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f195eb2ad00 x1631353804896624/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 376/1600 e 1 to 0 dl 1565295997 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 13:26:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 13:26:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e962b86c0/0x5d9ee6be387da7a4 lrc: 3/0,0 mode: PR/PR res: [0x2c002c81d:0x1c:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226c40cf8 expref: 93 pid: 97669 timeout: 4411066 lvb_type: 0 Aug 08 13:26:46 fir-md1-s1 kernel: LustreError: 97669:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2e0ce8f400 ns: mdt-fir-MDT0002_UUID lock: ffff8f19cd8c06c0/0x5d9ee6be387ec07d lrc: 1/0,0 mode: EX/EX res: [0x2c002c81d:0x1c:0x0].0x0 bits 0x8/0x0 rrc: 4 type: IBT flags: 0x54801000000000 nid: 10.8.11.6@o2ib6 remote: 0x721c85a226c40d0d expref: 49 pid: 97669 timeout: 0 lvb_type: 3 Aug 08 13:26:46 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f195eb2ad00 x1631353804896624/t357266007810(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:7/0 lens 376/1568 e 1 to 0 dl 1565295997 ref 1 fl Complete:/0/0 rc -107/-107 Aug 08 13:28:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 13:29:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 13:29:03 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 13:30:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 13:30:03 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 13:33:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 13:33:14 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 13:39:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 13:39:11 fir-md1-s1 kernel: Lustre: Skipped 14992 previous similar messages Aug 08 13:39:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 13:40:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 13:40:28 fir-md1-s1 kernel: Lustre: Skipped 14960 previous similar messages Aug 08 13:44:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 13:44:16 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 13:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 13:49:11 fir-md1-s1 kernel: Lustre: Skipped 9140 previous similar messages Aug 08 13:50:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 13:50:36 fir-md1-s1 kernel: Lustre: Skipped 9127 previous similar messages Aug 08 13:53:41 fir-md1-s1 kernel: Lustre: 50442:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565297614/real 1565297614] req@ffff8f33efa8e300 x1636757949412720/t0(0) o106->fir-MDT0000@10.9.114.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565297621 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 13:53:48 fir-md1-s1 kernel: Lustre: 50442:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565297621/real 1565297621] req@ffff8f33efa8e300 x1636757949412720/t0(0) o106->fir-MDT0000@10.9.114.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565297628 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 08 13:53:50 fir-md1-s1 kernel: Lustre: 23586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f32995f6f00 x1638249604482384/t0(0) o101->83b4afa2-a367-a71c-8602-481ad43297ce@10.8.0.68@o2ib6:24/0 lens 480/568 e 1 to 0 dl 1565297634 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 13:53:56 fir-md1-s1 kernel: Lustre: 50442:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565297628/real 1565297628] req@ffff8f33efa8e300 x1636757949412720/t0(0) o106->fir-MDT0000@10.9.114.4@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565297635 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 08 13:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c22bc9000, cur 1565297639 expire 1565297489 last 1565297412 Aug 08 13:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0d11f504-1c11-cd97-b8af-49b86c52b9a6 (at 10.9.112.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16b1ed4800, cur 1565297642 expire 1565297492 last 1565297415 Aug 08 13:54:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 08 13:56:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 13:56:43 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 08 13:59:05 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 13:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 13:59:24 fir-md1-s1 kernel: Lustre: Skipped 61899 previous similar messages Aug 08 13:59:38 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 13:59:47 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 13:59:56 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 14:00:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:00:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 14:00:29 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 14:01:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 43c47423-a225-1e44-717a-5288b8e7b7db (at 10.8.8.37@o2ib6) reconnecting Aug 08 14:01:00 fir-md1-s1 kernel: Lustre: Skipped 61891 previous similar messages Aug 08 14:01:00 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 14:01:34 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 08 14:03:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:04:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:05:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:08:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:08:17 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 14:08:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 14:08:45 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 08 14:09:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 14:09:26 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 08 14:11:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 14:11:06 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 14:18:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:19:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 14:19:11 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 08 14:20:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 14:20:04 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 08 14:21:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 14:21:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 14:29:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 14:29:56 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 14:30:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:30:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 14:30:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 14:30:30 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 08 14:31:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 14:31:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 14:40:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 14:40:02 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 14:40:09 fir-md1-s1 kernel: Lustre: 21483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300402/real 1565300402] req@ffff8f2333248f00 x1636758016949264/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300409 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:40:23 fir-md1-s1 kernel: Lustre: 10586:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300416/real 1565300416] req@ffff8f37393cf800 x1636758017205520/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300423 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:40:42 fir-md1-s1 kernel: Lustre: 23747:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300435/real 1565300435] req@ffff8f2fc713ec00 x1636758017547984/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300442 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:40:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 14:40:55 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 08 14:41:04 fir-md1-s1 kernel: Lustre: 23077:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300457/real 1565300457] req@ffff8f2df083cb00 x1636758017882144/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300464 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:41:10 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300463/real 1565300463] req@ffff8f16cc128000 x1636758017909296/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300470 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:41:24 fir-md1-s1 kernel: Lustre: 21457:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300477/real 1565300477] req@ffff8f1e595b4200 x1636758018149008/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300484 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:41:24 fir-md1-s1 kernel: Lustre: 21457:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 08 14:41:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 14:41:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 14:42:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:42:25 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 14:43:37 fir-md1-s1 kernel: Lustre: 50584:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565300610/real 1565300610] req@ffff8f27fd525d00 x1636758020425040/t0(0) o106->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565300617 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 14:50:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 14:50:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 08 14:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 14:51:04 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 08 14:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 14:52:10 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 14:58:11 fir-md1-s1 kernel: Lustre: 22288:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1f46f73000 x1631353808618736/t0(0) o101->17e26c1e-4877-4fff-89e1-78bf5463918b@10.8.11.6@o2ib6:16/0 lens 376/1600 e 1 to 0 dl 1565301496 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 14:59:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 14:59:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 15:00:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 15:00:49 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 15:01:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 15:01:25 fir-md1-s1 kernel: Lustre: Skipped 10919 previous similar messages Aug 08 15:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 15:02:34 fir-md1-s1 kernel: Lustre: Skipped 10880 previous similar messages Aug 08 15:11:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 15:11:31 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 08 15:12:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 15:12:36 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 15:14:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 15:14:31 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 08 15:15:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 15:15:12 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 15:17:30 fir-md1-s1 kernel: Lustre: 23746:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565302643/real 1565302643] req@ffff8f1839480300 x1636758068141488/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565302650 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 15:17:37 fir-md1-s1 kernel: Lustre: 23746:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565302650/real 1565302650] req@ffff8f1839480300 x1636758068141488/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565302657 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 08 15:17:46 fir-md1-s1 kernel: Lustre: 22005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565302659/real 1565302659] req@ffff8f181159da00 x1636758068579616/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565302666 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 15:20:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3d918b61-12b4-26ad-4d6a-14187221c6e3 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22deafe800, cur 1565302842 expire 1565302692 last 1565302615 Aug 08 15:20:42 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 08 15:21:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 15:21:38 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 08 15:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 15:23:00 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 15:23:14 fir-md1-s1 kernel: Lustre: 23756:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565302987/real 1565302987] req@ffff8f2d5bba3900 x1636758196337168/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565302994 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 15:23:32 fir-md1-s1 kernel: Lustre: 21371:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ed1936000 x1639984111695440/t0(0) o101->baaf9aa6-d6ac-d219-ff91-f47dd67dd412@10.8.29.6@o2ib6:7/0 lens 376/1600 e 0 to 0 dl 1565303017 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 15:23:42 fir-md1-s1 kernel: LustreError: 23756:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.10@o2ib6) failed to reply to blocking AST (req@ffff8f2d5bba3900 x1636758196337168 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2d9395f2c0/0x5d9ee6becaa81cc2 lrc: 4/0,0 mode: PR/PR res: [0x200029ecb:0x5dd:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a817796de8d19 expref: 974931 pid: 23761 timeout: 4418104 lvb_type: 0 Aug 08 15:23:42 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.10@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 08 15:23:42 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2d9395f2c0/0x5d9ee6becaa81cc2 lrc: 3/0,0 mode: PR/PR res: [0x200029ecb:0x5dd:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a817796de8d19 expref: 974932 pid: 23761 timeout: 0 lvb_type: 0 Aug 08 15:23:42 fir-md1-s1 kernel: LustreError: 25087:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303022 with bad export cookie 6746082879705672472 Aug 08 15:23:43 fir-md1-s1 kernel: LustreError: 31002:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303023 with bad export cookie 6746082879705672472 Aug 08 15:23:43 fir-md1-s1 kernel: LustreError: 31002:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 12 previous similar messages Aug 08 15:23:44 fir-md1-s1 kernel: LustreError: 25080:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303024 with bad export cookie 6746082879705672472 Aug 08 15:23:44 fir-md1-s1 kernel: LustreError: 25080:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 26 previous similar messages Aug 08 15:23:46 fir-md1-s1 kernel: LustreError: 25087:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303026 with bad export cookie 6746082879705672472 Aug 08 15:23:46 fir-md1-s1 kernel: LustreError: 25087:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 48 previous similar messages Aug 08 15:23:50 fir-md1-s1 kernel: LustreError: 25075:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303030 with bad export cookie 6746082879705672472 Aug 08 15:23:50 fir-md1-s1 kernel: LustreError: 25075:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 94 previous similar messages Aug 08 15:23:58 fir-md1-s1 kernel: LustreError: 20370:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303038 with bad export cookie 6746082879705672472 Aug 08 15:23:58 fir-md1-s1 kernel: LustreError: 20370:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 271 previous similar messages Aug 08 15:24:14 fir-md1-s1 kernel: LustreError: 20372:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565303054 with bad export cookie 6746082879705672472 Aug 08 15:24:14 fir-md1-s1 kernel: LustreError: 20372:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 586 previous similar messages Aug 08 15:24:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.10@o2ib6, removing former export from same NID Aug 08 15:24:53 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 15:25:12 fir-md1-s1 kernel: LustreError: 23756:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565303022, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2eba3dbcc0/0x5d9ee6bef029347d lrc: 3/0,1 mode: --/EX res: [0x200029ecb:0x5dd:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23756 timeout: 0 lvb_type: 0 Aug 08 15:26:28 fir-md1-s1 kernel: LNet: Service thread pid 23756 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 08 15:26:28 fir-md1-s1 kernel: Pid: 23756, comm: mdt02_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 08 15:26:28 fir-md1-s1 kernel: Call Trace: Aug 08 15:26:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 08 15:26:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 08 15:26:28 fir-md1-s1 kernel: [] mdt_layout_change+0x2a4/0x430 [mdt] Aug 08 15:26:28 fir-md1-s1 kernel: [] mdt_intent_layout+0x7ee/0xcc0 [mdt] Aug 08 15:26:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 08 15:26:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 08 15:26:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 08 15:26:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 08 15:26:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 08 15:26:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565303188.23756 Aug 08 15:27:05 fir-md1-s1 kernel: LNet: Service thread pid 23756 completed after 238.03s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 08 15:30:07 fir-md1-s1 kernel: LustreError: 20463:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2ce139c200 x1636758207744608/t0(0) o104->fir-MDT0000@10.8.9.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 08 15:30:07 fir-md1-s1 kernel: LustreError: 20463:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Aug 08 15:30:32 fir-md1-s1 kernel: Lustre: 23664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2636d72100 x1633731329821088/t0(0) o101->23504e9e-38b0-73ab-6845-a2f9362c9ca3@10.8.29.7@o2ib6:7/0 lens 376/1600 e 0 to 0 dl 1565303437 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 15:30:36 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f297a274ec0/0x5d9ee6bed2cfe512 lrc: 3/0,0 mode: PR/PR res: [0x200029937:0x1d89:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a817797c9afe0 expref: 85481 pid: 21371 timeout: 4418496 lvb_type: 0 Aug 08 15:31:37 fir-md1-s1 kernel: LustreError: 20463:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565303407, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2e8f1ff500/0x5d9ee6bef9216a9a lrc: 3/0,1 mode: --/EX res: [0x200029937:0x1d89:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20463 timeout: 0 lvb_type: 0 Aug 08 15:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to eec1f027-4995-ece3-35c3-8add26e67fef (at 10.8.29.7@o2ib6) Aug 08 15:31:40 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Aug 08 15:32:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 15:32:43 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 08 15:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 15:33:19 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 15:35:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 15:35:01 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 15:39:56 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1aa06ab000 x1639996177317696/t0(0) o101->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:1/0 lens 480/568 e 0 to 0 dl 1565304001 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 15:41:01 fir-md1-s1 kernel: LustreError: 26253:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565303971, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2509f10b40/0x5d9ee6bf0515507b lrc: 3/0,1 mode: --/PW res: [0x200029f52:0xc0:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26253 timeout: 0 lvb_type: 0 Aug 08 15:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 15:41:57 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 08 15:42:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2522b7f500/0x5d9ee6bf05133eea lrc: 3/0,0 mode: PR/PR res: [0x200029f52:0xc0:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81779bc0ed3c expref: 27 pid: 97659 timeout: 4419180 lvb_type: 0 Aug 08 15:42:00 fir-md1-s1 kernel: LustreError: 26253:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2f34957c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f2509f10b40/0x5d9ee6bf0515507b lrc: 3/0,0 mode: PW/PW res: [0x200029f52:0xc0:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.9.10@o2ib6 remote: 0xb47a81779bc0ee23 expref: 20 pid: 26253 timeout: 0 lvb_type: 0 Aug 08 15:42:00 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:119s); client may timeout. req@ffff8f1aa06ab000 x1639996177317696/t0(0) o101->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:1/0 lens 480/536 e 0 to 0 dl 1565304001 ref 1 fl Complete:/0/0 rc -107/-107 Aug 08 15:43:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 15:43:32 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 15:46:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 15:46:01 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 15:46:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 15:46:31 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 15:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 15:51:58 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 15:53:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 15:53:34 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 15:55:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client af6aae2b-ae38-09fc-b3b8-80a4a1e2a11f (at 10.8.27.27@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f343d050400, cur 1565304953 expire 1565304803 last 1565304726 Aug 08 15:55:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 15:56:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 15:56:15 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 15:57:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 15:57:41 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 15:58:59 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f16d2fa7800 x1639996178584464/t0(0) o101->0a76f504-1306-a831-1f93-856480da5211@10.8.9.10@o2ib6:4/0 lens 480/568 e 1 to 0 dl 1565305144 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 16:00:14 fir-md1-s1 kernel: LustreError: 97666:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565305124, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16b90b4a40/0x5d9ee6bf1c0925e9 lrc: 3/0,1 mode: --/PW res: [0x200029c72:0x20:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97666 timeout: 0 lvb_type: 0 Aug 08 16:01:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f30b2e069c0/0x5d9ee6bf1c08dd3b lrc: 3/0,0 mode: PR/PR res: [0x200029c72:0x20:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81779bc8db20 expref: 37 pid: 10144 timeout: 4420333 lvb_type: 0 Aug 08 16:01:13 fir-md1-s1 kernel: LustreError: 97666:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2cb8d7ec00 ns: mdt-fir-MDT0000_UUID lock: ffff8f16b90b4a40/0x5d9ee6bf1c0925e9 lrc: 3/0,0 mode: PW/PW res: [0x200029c72:0x20:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.9.10@o2ib6 remote: 0xb47a81779bc8db7b expref: 26 pid: 97666 timeout: 0 lvb_type: 0 Aug 08 16:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 16:02:10 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 08 16:03:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 16:03:36 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 16:06:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 16:06:59 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 16:08:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 16:08:01 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 08 16:12:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 16:12:23 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 08 16:13:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 16:13:50 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 16:18:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 16:18:24 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 16:22:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 16:22:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 16:22:47 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 16:22:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 16:24:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 16:24:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 16:29:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.9.10@o2ib6, removing former export from same NID Aug 08 16:29:09 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 08 16:32:15 fir-md1-s1 kernel: Lustre: 21677:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27661be600 x1638906250697840/t0(0) o101->d1277529-cbf1-b0b5-ff2d-5b114cf66536@10.9.112.14@o2ib4:20/0 lens 1776/3288 e 1 to 0 dl 1565307140 ref 2 fl Interpret:/0/0 rc 0/0 Aug 08 16:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 16:32:49 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 08 16:33:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 16:33:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 16:33:30 fir-md1-s1 kernel: LustreError: 23704:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565307120, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f25e7c3ec00/0x5d9ee6bf48b7b848 lrc: 3/0,1 mode: --/CW res: [0x200029c6e:0x40b:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23704 timeout: 0 lvb_type: 0 Aug 08 16:34:29 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.9.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e7af821c0/0x5d9ee6bf47656f6d lrc: 3/0,0 mode: PR/PR res: [0x200029c6e:0x40b:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.9.10@o2ib6 remote: 0xb47a81779bd3a0b0 expref: 23 pid: 26256 timeout: 4422329 lvb_type: 0 Aug 08 16:35:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 16:35:04 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 16:38:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0a76f504-1306-a831-1f93-856480da5211 (at 10.8.9.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f294f9ec800, cur 1565307502 expire 1565307352 last 1565307275 Aug 08 16:38:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 16:39:12 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.10@o2ib6 arrived at 1565307552 with bad export cookie 6746083001132501891 Aug 08 16:39:12 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1208 previous similar messages Aug 08 16:39:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 16:39:34 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 08 16:42:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 16:42:50 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 08 16:45:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 16:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 16:45:16 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 08 16:49:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd893860-5176-4515-b4a1-d3931097102d (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15b4315000, cur 1565308165 expire 1565308015 last 1565307938 Aug 08 16:49:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 16:49:36 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 16:53:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 16:53:29 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 08 16:55:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 16:55:33 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 16:59:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 16:59:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 08 17:00:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 17:00:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 17:03:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 17:03:34 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 08 17:05:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 17:05:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 17:09:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 17:09:51 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 17:10:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 17:10:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 17:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 17:13:49 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 17:16:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 17:16:11 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 08 17:21:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 17:21:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 17:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 17:24:08 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 08 17:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 17:26:14 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 17:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a0daefdf-f61a-656c-5770-ae79f40f7052 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fa6e7c00, cur 1565310756 expire 1565310606 last 1565310529 Aug 08 17:32:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 17:32:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 17:32:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 08 17:34:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 17:34:13 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 08 17:36:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 17:36:39 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 17:40:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 17:40:01 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 17:44:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 17:44:15 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 08 17:44:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 17:44:15 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 17:44:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 17:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 17:47:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 17:51:14 fir-md1-s1 kernel: Lustre: 22281:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565311867/real 1565311867] req@ffff8f28a480b900 x1636758527972032/t0(0) o104->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565311874 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 17:51:14 fir-md1-s1 kernel: Lustre: 22281:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 08 17:53:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 17:53:53 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 17:54:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 17:54:35 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 08 17:54:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 17:54:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 17:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 17:57:37 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 18:02:06 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 08 18:02:06 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Aug 08 18:02:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 18:02:57 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 18:04:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 18:04:42 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 08 18:05:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 18:05:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 08 18:07:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 18:07:49 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 08 18:13:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5c9730cd-7436-23ca-8235-b646cceee599 (at 10.9.103.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32d134a800, cur 1565313223 expire 1565313073 last 1565312996 Aug 08 18:13:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 18:14:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 18:14:08 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 18:14:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 18:14:54 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 08 18:15:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 08 18:15:55 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 08 18:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 18:18:16 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 08 18:24:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 18:24:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 18:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 18:25:04 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 08 18:26:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 18:26:31 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 18:28:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 18:28:30 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 18:35:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 18:35:20 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 08 18:36:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 18:36:38 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 18:38:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 18:38:42 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 18:38:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 18:38:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 18:41:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 986d7142-8314-d8bb-6b26-61d80ac2ae6f (at 10.9.103.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2eb0987800, cur 1565314900 expire 1565314750 last 1565314673 Aug 08 18:41:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 18:45:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 18:45:39 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 08 18:46:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32bb97dc00, cur 1565315197 expire 1565315047 last 1565314970 Aug 08 18:46:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 18:47:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 08 18:47:13 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 18:48:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 18:48:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 18:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 18:48:46 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 08 18:51:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b896d75c-6855-7213-c870-b9e4dae67229 (at 10.9.103.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e6e79a400, cur 1565315509 expire 1565315359 last 1565315282 Aug 08 18:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 18:55:42 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 08 18:59:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 18:59:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 18:59:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 18:59:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 19:00:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 19:00:04 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 19:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 19:05:44 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 08 19:09:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 19:09:33 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 19:11:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 19:11:43 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 19:13:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 19:13:27 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 08 19:16:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 19:16:07 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 08 19:19:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 43c47423-a225-1e44-717a-5288b8e7b7db (at 10.8.8.37@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dbbd27c00, cur 1565317153 expire 1565317003 last 1565316926 Aug 08 19:19:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 19:19:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 19:19:41 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 19:22:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 19:22:54 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 19:25:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 19:25:49 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 19:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 19:26:14 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 19:29:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 19:29:47 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 19:35:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 19:35:58 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 19:36:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 19:36:26 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 08 19:37:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 19:37:19 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 08 19:40:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 19:40:03 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 19:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 19:46:29 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 08 19:50:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 19:50:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 19:50:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 08 19:50:10 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 08 19:51:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 19:51:00 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 19:56:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 19:56:32 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 08 20:00:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 20:00:24 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 20:01:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 20:01:41 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 08 20:02:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:02:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 20:06:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 20:06:44 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 08 20:10:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 20:10:45 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 20:12:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 20:12:36 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 20:16:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 20:16:50 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 08 20:21:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 20:21:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 20:23:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 20:23:20 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 08 20:23:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:23:32 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 20:26:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 08 20:26:55 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 20:29:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 20:31:21 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 20:32:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:32:40 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 20:33:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 20:33:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 08 20:36:28 fir-md1-s1 kernel: Lustre: 21459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565321781/real 1565321781] req@ffff8f204d030c00 x1636758866966720/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565321788 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 20:36:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 20:36:56 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 08 20:37:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:37:46 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 20:41:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 20:41:26 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 20:43:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 20:43:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 20:47:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 20:47:22 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 08 20:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 20:51:51 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 20:53:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 20:53:45 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 20:55:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 20:55:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 20:57:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 20:57:29 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 08 21:02:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 21:02:08 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 08 21:05:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:05:20 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 21:07:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 21:07:54 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 08 21:10:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 21:10:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 21:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 21:12:10 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 21:13:31 fir-md1-s1 kernel: Lustre: 23588:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565324004/real 1565324004] req@ffff8f2baa3d8f00 x1636758917802608/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565324011 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 21:14:52 fir-md1-s1 kernel: Lustre: 20722:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565324085/real 1565324085] req@ffff8f1f10df3c00 x1636758919885104/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565324092 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 21:15:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:15:30 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 08 21:18:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 21:18:10 fir-md1-s1 kernel: Lustre: Skipped 90 previous similar messages Aug 08 21:22:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 21:22:23 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 08 21:26:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:26:16 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 08 21:28:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 21:28:12 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 08 21:28:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 21:28:31 fir-md1-s1 kernel: LustreError: Skipped 9 previous similar messages Aug 08 21:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 21:32:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 21:36:04 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 08 21:36:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:36:17 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 21:38:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 21:38:17 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 08 21:39:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 21:39:03 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 21:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 08 21:42:37 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 08 21:47:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:47:05 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 08 21:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 21:48:32 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 08 21:51:21 fir-md1-s1 kernel: Lustre: 23679:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565326274/real 1565326274] req@ffff8f3175ef0c00 x1636758967951424/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565326281 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 21:52:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 21:52:42 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 21:57:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 21:57:11 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 08 21:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 21:58:35 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 21:59:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 21:59:49 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 22:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 22:02:54 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 08 22:03:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 22:08:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 08 22:08:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 22:08:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 22:08:54 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 08 22:13:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 22:13:22 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 22:13:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 22:13:34 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 22:18:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 08 22:18:36 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 08 22:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 22:19:00 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 08 22:19:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 22:19:25 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 22:23:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 22:23:39 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 08 22:29:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 22:29:32 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 08 22:29:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 22:29:47 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 08 22:33:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 22:33:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 22:33:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 22:33:56 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 22:38:35 fir-md1-s1 kernel: Lustre: 23657:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565329108/real 1565329108] req@ffff8f2621f16300 x1636759031357216/t0(0) o104->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565329115 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 08 22:39:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 22:39:36 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 08 22:39:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 22:39:51 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 08 22:44:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 22:44:42 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 08 22:49:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 22:49:27 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 08 22:49:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 22:49:43 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 08 22:51:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 22:51:06 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 08 22:54:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 22:54:55 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 22:59:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 22:59:57 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 08 23:01:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:01:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 08 23:03:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 23:03:22 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 08 23:05:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 23:05:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 23:10:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 08 23:10:10 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 08 23:14:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:14:29 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 08 23:15:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 08 23:15:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 08 23:15:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 23:15:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 23:20:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 23:20:13 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 08 23:25:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:25:08 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 23:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 23:26:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 08 23:30:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 08 23:30:18 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 08 23:30:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 23:30:29 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 08 23:35:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:35:46 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 08 23:37:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 08 23:37:00 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 08 23:40:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f081a199000, cur 1565332813 expire 1565332663 last 1565332586 Aug 08 23:40:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 08 23:40:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 08 23:40:49 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 08 23:46:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:46:32 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 08 23:47:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 08 23:47:38 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 08 23:49:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 08 23:49:10 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 08 23:51:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 08 23:51:40 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Aug 08 23:56:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 08 23:56:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 08 23:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 08 23:58:06 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 00:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 00:01:43 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 00:05:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 00:05:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 00:06:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 00:06:46 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 00:08:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 00:08:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 00:11:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 00:11:50 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 09 00:16:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 00:16:37 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 09 00:17:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 00:17:57 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 09 00:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 00:19:01 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 00:21:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 00:21:55 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 09 00:27:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 00:27:07 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 00:27:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 00:27:58 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 09 00:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 00:29:24 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 00:31:37 fir-md1-s1 kernel: Lustre: 21371:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565335890/real 1565335890] req@ffff8f2ba86c5400 x1636759180912496/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565335897 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 00:32:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 00:32:30 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 09 00:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 00:39:42 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 09 00:40:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 00:40:27 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 09 00:40:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e6c87e400, cur 1565336441 expire 1565336291 last 1565336214 Aug 09 00:42:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 00:42:35 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 00:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22a32de800, cur 1565336878 expire 1565336728 last 1565336651 Aug 09 00:47:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 00:47:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 00:50:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 00:50:09 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 00:53:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 00:53:01 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 09 00:53:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 00:53:15 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 09 01:00:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 01:00:21 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 01:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 01:03:22 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 09 01:03:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:03:39 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 09 01:05:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:10:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 01:10:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 01:11:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:13:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 01:13:25 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 09 01:15:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:15:06 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 09 01:19:52 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565338785/real 1565338785] req@ffff8f230dc56900 x1636759240373664/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565338792 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 01:20:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 01:20:50 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 01:22:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2935556400, cur 1565338937 expire 1565338787 last 1565338710 Aug 09 01:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 01:23:43 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 09 01:24:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:26:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:26:53 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 09 01:30:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:30:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 01:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 01:31:13 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 01:33:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 01:33:48 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 01:37:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:37:18 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 01:37:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:37:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 01:41:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 01:41:18 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 09 01:43:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 01:43:49 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 09 01:48:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 01:48:08 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 09 01:48:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:48:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 09 01:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 01:51:56 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 01:54:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 01:54:08 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 09 01:58:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 01:58:40 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 09 02:02:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 02:02:29 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 02:04:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 02:04:28 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 09 02:08:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:08:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 02:08:52 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 02:12:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:13:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 02:13:11 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 02:14:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 02:14:35 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 09 02:15:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:15:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 02:19:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 02:19:21 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 09 02:20:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:20:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 02:23:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 02:23:32 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 02:24:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 02:24:51 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 09 02:29:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 02:29:36 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 02:30:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9dfc2bda-cf66-13a5-c506-30cd55e4267b (at 10.9.108.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd873000, cur 1565343025 expire 1565342875 last 1565342798 Aug 09 02:30:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f34eff35-9f31-0888-c4bb-e6f93e879de4 (at 10.9.108.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24fe402800, cur 1565343040 expire 1565342890 last 1565342813 Aug 09 02:30:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 02:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 02:34:11 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 09 02:35:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 02:35:31 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 09 02:35:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:40:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 02:40:38 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 09 02:45:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 02:45:22 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 02:45:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 02:45:32 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 09 02:48:02 fir-md1-s1 kernel: Lustre: 23710:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a77243f00 x1634829538795104/t0(0) o36->2f3c211c-52c6-1ee5-d4d4-865b726ca750@10.8.11.14@o2ib6:7/0 lens 504/2888 e 1 to 0 dl 1565344087 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 02:48:14 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1f2cf8cb00 x1631686914929856/t0(0) o101->8a2377b9-dd4d-1468-124f-a22e5b47b9b4@10.8.11.23@o2ib6:19/0 lens 576/3264 e 0 to 0 dl 1565344099 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 02:48:14 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 09 02:48:15 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f276e73f200 x1639549046976448/t0(0) o101->84093e77-fb6d-e471-60a4-cda91580dd1f@10.8.10.33@o2ib6:20/0 lens 584/3264 e 1 to 0 dl 1565344100 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 02:48:17 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2e83250900 x1631696401528160/t0(0) o101->b103a359-9ee5-5c8b-3bb2-b2e446ea13b7@10.8.11.13@o2ib6:22/0 lens 584/3264 e 0 to 0 dl 1565344102 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 02:48:17 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 09 02:48:21 fir-md1-s1 kernel: Lustre: 23586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f34623e8c00 x1635106589243760/t0(0) o101->f0500fad-d6f6-55b9-90d1-85c7444ded54@10.8.1.10@o2ib6:26/0 lens 576/3264 e 0 to 0 dl 1565344106 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 02:48:21 fir-md1-s1 kernel: Lustre: 23586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Aug 09 02:48:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 02:48:42 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:1s); client may timeout. req@ffff8f276e73f200 x1639549046976448/t0(0) o101->84093e77-fb6d-e471-60a4-cda91580dd1f@10.8.10.33@o2ib6:20/0 lens 584/592 e 1 to 0 dl 1565344121 ref 1 fl Complete:/0/0 rc 0/0 Aug 09 02:50:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 02:50:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 02:52:13 fir-md1-s1 kernel: Lustre: 21371:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565344326/real 1565344326] req@ffff8f2f2140c800 x1636759361447488/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565344333 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 02:52:20 fir-md1-s1 kernel: Lustre: 21371:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565344333/real 1565344333] req@ffff8f2f2140c800 x1636759361447488/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565344340 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 02:55:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 02:55:32 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 09 02:56:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 02:56:40 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 09 03:00:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 03:00:21 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 03:01:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 03:01:06 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 03:05:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 03:05:49 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 03:06:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 03:06:46 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 03:12:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 03:12:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 03:15:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 03:15:41 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 03:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 03:16:49 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 03:16:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 03:16:49 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 03:23:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 03:23:56 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 03:25:45 fir-md1-s1 kernel: Lustre: 23678:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565346338/real 1565346338] req@ffff8f2831792d00 x1636759410008816/t0(0) o106->fir-MDT0002@10.8.22.20@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565346345 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 03:27:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 03:27:00 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 03:27:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 03:27:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 03:27:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 03:27:37 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 09 03:33:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 03:33:57 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 03:37:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 03:37:01 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 09 03:37:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 03:37:37 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 03:44:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 03:44:28 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 03:47:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 03:47:11 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Aug 09 03:48:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 03:48:08 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 03:55:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 03:55:04 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 03:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 03:57:18 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 03:58:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 03:58:24 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 04:01:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:01:55 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 09 04:04:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:05:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:05:57 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 09 04:07:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 04:07:27 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 09 04:08:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 04:08:37 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 04:09:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:16:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:16:33 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 09 04:17:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 04:17:29 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 04:18:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 04:18:59 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 04:20:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:20:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 04:27:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:27:13 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 04:27:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 04:27:30 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 04:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 04:29:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 04:37:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:37:14 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 04:37:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 04:37:31 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 09 04:39:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 04:39:18 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 04:40:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:47:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:47:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 04:47:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 04:47:33 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 09 04:49:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 04:49:51 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 04:57:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 04:57:04 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 04:57:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 04:57:28 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 04:57:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 04:57:45 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 09 05:01:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 05:01:20 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 05:07:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 05:07:46 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 09 05:08:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 05:08:46 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 09 05:11:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 05:11:22 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 05:12:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 05:12:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 05:17:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 05:17:52 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 05:20:37 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42cda97400, cur 1565353237 expire 1565353087 last 1565353010 Aug 09 05:21:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 05:21:28 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 05:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 05:21:36 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 05:27:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 05:27:58 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 09 05:31:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 05:31:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 05:34:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 05:34:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 05:38:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 05:38:05 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 05:38:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 05:38:43 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 05:40:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 05:40:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 05:41:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 05:41:59 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 05:43:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 05:47:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 05:47:17 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 09 05:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aed97f27-8964-9f20-4c14-e539f266f21b (at 10.8.24.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cd8e46000, cur 1565354886 expire 1565354736 last 1565354659 Aug 09 05:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client aed97f27-8964-9f20-4c14-e539f266f21b (at 10.8.24.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42e69a4400, cur 1565354904 expire 1565354754 last 1565354677 Aug 09 05:48:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 05:48:26 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 05:52:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 05:52:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 05:57:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e25f5e400, cur 1565355421 expire 1565355271 last 1565355194 Aug 09 05:57:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 05:57:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 05:57:19 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 05:58:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 05:58:31 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 09 06:02:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 06:02:30 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 06:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 06:08:33 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 06:10:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 06:10:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 06:12:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 06:12:24 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 06:12:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 06:12:30 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 06:15:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 06:17:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 06:18:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 06:18:40 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 09 06:20:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 06:20:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 06:22:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 06:22:05 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 06:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 06:23:00 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 06:28:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 06:28:26 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 06:29:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 06:29:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 09 06:32:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 06:32:20 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 06:33:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 06:33:10 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 06:39:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 06:39:53 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 09 06:42:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 06:42:52 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 06:43:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 06:43:36 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 06:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 06:50:17 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 09 06:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 06:53:38 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 06:54:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 06:54:03 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 07:00:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 07:00:19 fir-md1-s1 kernel: Lustre: Skipped 21798 previous similar messages Aug 09 07:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 07:04:03 fir-md1-s1 kernel: Lustre: Skipped 21758 previous similar messages Aug 09 07:04:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 07:04:12 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 09 07:05:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:05:05 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 07:08:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 07:10:39 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 09 07:13:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 07:14:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 07:15:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 07:15:52 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 07:21:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 07:21:05 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 09 07:24:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 07:24:26 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 07:24:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:24:56 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 07:27:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 09 07:27:41 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 09 07:31:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 07:31:15 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 07:34:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 07:34:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 07:37:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:37:21 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 07:38:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 07:38:41 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 09 07:41:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 07:41:38 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 09 07:44:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 07:44:51 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 09 07:48:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 07:48:59 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 09 07:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 07:51:44 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 09 07:54:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 07:54:08 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 07:54:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 07:54:55 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 07:59:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 07:59:18 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 08:01:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 08:01:50 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Aug 09 08:05:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 08:05:48 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 08:09:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 08:09:46 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 08:11:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 08:11:52 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 09 08:16:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 08:16:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 08:17:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:17:18 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 08:19:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 08:19:46 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 08:19:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 08:21:52 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 09 08:27:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 08:27:04 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 08:30:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 08:30:17 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 08:30:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fa11fc000, cur 1565364624 expire 1565364474 last 1565364397 Aug 09 08:31:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 08:31:57 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 09 08:34:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:35:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:36:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:37:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 08:37:18 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 08:40:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 08:40:25 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 08:40:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23c16b7000, cur 1565365241 expire 1565365091 last 1565365014 Aug 09 08:42:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 08:42:04 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 08:47:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 08:47:48 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 08:49:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:49:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 08:50:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:51:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 08:51:21 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 08:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 08:52:08 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 08:52:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:53:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:57:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:57:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 08:58:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 08:58:09 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 08:59:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 08:59:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 09:02:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 09:02:27 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 09 09:02:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 09:02:27 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 09 09:04:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3951b7f800, cur 1565366645 expire 1565366495 last 1565366418 Aug 09 09:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 09:08:14 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 09:10:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d2aa9ac00, cur 1565367004 expire 1565366854 last 1565366777 Aug 09 09:12:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 09:12:29 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 09:12:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 09:12:29 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 09 09:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 09:18:16 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 09:22:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 09:22:37 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 09 09:22:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 09:22:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 09 09:24:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 09:24:24 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 09:26:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 09:28:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 09:28:28 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 09:30:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d1debb000, cur 1565368236 expire 1565368086 last 1565368009 Aug 09 09:32:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 09:32:46 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 09 09:34:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 09:34:15 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 09:35:26 fir-md1-s1 kernel: Lustre: 23725:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368519/real 1565368519] req@ffff8f34ca950000 x1636759964168400/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368526 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:37:40 fir-md1-s1 kernel: Lustre: 25681:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0c97abd700 x1631556113548720/t0(0) o101->6efc0e4b-1ad3-bb80-daf0-68493389a065@10.9.106.18@o2ib4:15/0 lens 1800/3288 e 0 to 0 dl 1565368665 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:37:40 fir-md1-s1 kernel: Lustre: 25681:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 09 09:37:53 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f17ff270000 x1641226972811952/t0(0) o36->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:28/0 lens 504/2888 e 0 to 0 dl 1565368678 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:38:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 09:38:29 fir-md1-s1 kernel: Lustre: Skipped 138657 previous similar messages Aug 09 09:38:30 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368703/real 1565368703] req@ffff8f1a5aa2c800 x1636759968793056/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368710 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:39:04 fir-md1-s1 kernel: Lustre: 23725:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368737/real 1565368737] req@ffff8f3384d52a00 x1636759969537200/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368744 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:39:11 fir-md1-s1 kernel: Lustre: 23725:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368744/real 1565368744] req@ffff8f3384d52a00 x1636759969537200/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368751 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 09:39:18 fir-md1-s1 kernel: Lustre: 23725:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368751/real 1565368751] req@ffff8f3384d52a00 x1636759969537200/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368758 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 09:40:26 fir-md1-s1 kernel: Lustre: 23619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565368819/real 1565368819] req@ffff8f2a6c6b8300 x1636759971085424/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565368826 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:42:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 09:42:48 fir-md1-s1 kernel: Lustre: Skipped 138685 previous similar messages Aug 09 09:43:00 fir-md1-s1 kernel: Lustre: 20555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f21dc9bb900 x1634919543007136/t0(0) o101->b908ba12-d3b5-f9d6-09e5-19b8bbf56c0a@10.8.20.28@o2ib6:5/0 lens 576/3264 e 0 to 0 dl 1565368985 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:44:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 09 09:44:30 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 09 09:46:34 fir-md1-s1 kernel: Lustre: 20721:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565369187/real 1565369187] req@ffff8f1f10d81b00 x1636759980861184/t0(0) o106->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565369194 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:46:34 fir-md1-s1 kernel: Lustre: 20721:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 09:47:16 fir-md1-s1 kernel: Lustre: 21675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565369229/real 1565369229] req@ffff8f4299fa6300 x1636759982416752/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565369236 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:47:16 fir-md1-s1 kernel: Lustre: 21675:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 09:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 09:48:34 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 09 09:50:21 fir-md1-s1 kernel: Lustre: 23665:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565369414/real 1565369414] req@ffff8f27f45ca400 x1636759986413792/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565369421 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:52:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 09:52:51 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 09 09:55:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 09:55:11 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 09:58:33 fir-md1-s1 kernel: Lustre: 97650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565369906/real 1565369906] req@ffff8f1cefed3000 x1636759996809680/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565369913 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 09:58:33 fir-md1-s1 kernel: Lustre: 97650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 09:58:41 fir-md1-s1 kernel: Lustre: 97655:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1982c6c800 x1641226973812128/t0(0) o36->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:16/0 lens 504/2888 e 1 to 0 dl 1565369926 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:58:41 fir-md1-s1 kernel: Lustre: 97655:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Aug 09 09:58:43 fir-md1-s1 kernel: Lustre: 23620:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/3), not sending early reply req@ffff8f3be2a6da00 x1641226973812176/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:18/0 lens 576/3264 e 0 to 0 dl 1565369928 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7504a0d-490a-d58a-1f75-439227e99fde (at 10.9.104.27@o2ib4) reconnecting Aug 09 09:58:47 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 09:59:12 fir-md1-s1 kernel: Lustre: 23725:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f399135ec00 x1631556948328256/t0(0) o36->bfb6e805-d5d9-30c1-c57c-f5c9b6f9d250@10.9.103.41@o2ib4:17/0 lens 496/448 e 0 to 0 dl 1565369957 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:59:15 fir-md1-s1 kernel: Lustre: 97654:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f054700b000 x1641226973957024/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:20/0 lens 576/3264 e 0 to 0 dl 1565369960 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 09:59:15 fir-md1-s1 kernel: Lustre: 97654:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 09 09:59:16 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f222132b180/0x5d9ee6c1b82cfb19 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a227068cc0 expref: 448 pid: 22283 timeout: 4485016 lvb_type: 0 Aug 09 10:03:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 10:03:37 fir-md1-s1 kernel: Lustre: Skipped 1144 previous similar messages Aug 09 10:06:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 10:06:04 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 10:08:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:08:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 10:08:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 10:08:57 fir-md1-s1 kernel: Lustre: Skipped 1111 previous similar messages Aug 09 10:09:00 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1ab4253c00 x1631704189417776/t0(0) o101->8002065c-41f3-287b-74f3-bbfad6694e44@10.8.25.31@o2ib6:5/0 lens 576/3264 e 0 to 0 dl 1565370545 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 10:09:00 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 09 10:09:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1de2e26c00/0x5d9ee6c1bb29afe3 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 496 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a227069675 expref: 19 pid: 97650 timeout: 4485604 lvb_type: 0 Aug 09 10:12:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2918948400, cur 1565370776 expire 1565370626 last 1565370549 Aug 09 10:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 10:13:37 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 10:18:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:19:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 10:19:04 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 10:19:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 10:19:19 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 10:19:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:20:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:23:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 10:23:44 fir-md1-s1 kernel: Lustre: Skipped 122541 previous similar messages Aug 09 10:27:36 fir-md1-s1 kernel: Lustre: 97669:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565371649/real 1565371649] req@ffff8f1e59adce00 x1636760035448224/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565371656 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 10:27:36 fir-md1-s1 kernel: Lustre: 97669:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 09 10:29:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 10:29:07 fir-md1-s1 kernel: Lustre: Skipped 293321 previous similar messages Aug 09 10:29:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 10:29:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 09 10:32:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:34:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 10:34:24 fir-md1-s1 kernel: Lustre: Skipped 170837 previous similar messages Aug 09 10:39:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 10:39:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 10:40:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 10:40:09 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 10:42:23 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 10:42:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:42:46 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 10:44:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 10:44:52 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 10:49:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 10:49:18 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 10:50:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 10:50:12 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 10:50:51 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565373044/real 1565373044] req@ffff8f2ed36f7200 x1636760067136080/t0(0) o104->fir-MDT0000@10.8.28.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565373051 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 10:50:51 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 10:50:58 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565373051/real 1565373051] req@ffff8f2ed36f7200 x1636760067136080/t0(0) o104->fir-MDT0000@10.8.28.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565373058 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 10:50:59 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2d57a7f500 x1635156252440640/t0(0) o101->7d5bccd7-a1b0-31f0-f111-9137011dc81d@10.9.109.18@o2ib4:4/0 lens 576/3264 e 1 to 0 dl 1565373064 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 10:50:59 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Aug 09 10:51:00 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e1ffaf800 x1636352796920784/t0(0) o101->374fd2d9-2972-20b7-dfa4-bf6b2470cf36@10.8.1.6@o2ib6:5/0 lens 576/3264 e 1 to 0 dl 1565373065 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 10:51:00 fir-md1-s1 kernel: Lustre: 25675:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 119 previous similar messages Aug 09 10:51:02 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26fae96000 x1631774380680336/t0(0) o101->5e395341-d08e-b211-8691-de95d36d3421@10.8.13.21@o2ib6:7/0 lens 576/0 e 1 to 0 dl 1565373067 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 09 10:51:02 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 97 previous similar messages Aug 09 10:51:06 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f34e44b9200 x1634175044210736/t0(0) o101->5d2e8657-ca4a-b8d0-c53b-852f922061df@10.9.104.59@o2ib4:11/0 lens 576/0 e 1 to 0 dl 1565373071 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 09 10:51:06 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 113 previous similar messages Aug 09 10:51:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 11f7dba6-7171-5836-2062-1974c5637c6a (at 10.8.28.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1936063000, cur 1565373069 expire 1565372919 last 1565372842 Aug 09 10:51:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.11.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f22b1d9a880/0x5d9ee6c1bf95002a lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.8.11.6@o2ib6 remote: 0x721c85a227069970 expref: 36 pid: 20465 timeout: 4488133 lvb_type: 0 Aug 09 10:51:13 fir-md1-s1 kernel: Lustre: 23666:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:5s); client may timeout. req@ffff8f40db322a00 x1631653974103664/t0(0) o101->8290d58b-0905-6161-be47-84efd8d09138@10.9.108.18@o2ib4:8/0 lens 576/0 e 1 to 0 dl 1565373068 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 09 10:51:13 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 10:51:13 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 09 10:51:13 fir-md1-s1 kernel: Lustre: 23666:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 118 previous similar messages Aug 09 10:54:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 10:54:47 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 10:55:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 10:55:10 fir-md1-s1 kernel: Lustre: Skipped 291 previous similar messages Aug 09 10:56:52 fir-md1-s1 kernel: Lustre: 23625:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565373401/real 1565373401] req@ffff8f0af3af2400 x1636760074627520/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565373412 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 10:56:52 fir-md1-s1 kernel: Lustre: 23625:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 10:57:06 fir-md1-s1 kernel: Lustre: 23686:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3672230000 x1631597699536000/t0(0) o101->f5468e72-fdf8-2c00-55b2-35b2a8b48641@10.9.107.9@o2ib4:11/0 lens 576/3264 e 0 to 0 dl 1565373431 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 10:57:06 fir-md1-s1 kernel: Lustre: 23686:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 590 previous similar messages Aug 09 10:57:13 fir-md1-s1 kernel: Lustre: 23625:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565373422/real 1565373422] req@ffff8f0af3af2400 x1636760074627520/t0(0) o104->fir-MDT0002@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565373433 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 09 10:57:22 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1c12dfd400 x1631598884020784/t0(0) o101->00035bd8-9418-aa27-cd30-274f6ec9cbbd@10.9.104.60@o2ib4:27/0 lens 576/0 e 0 to 0 dl 1565373447 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 09 10:57:22 fir-md1-s1 kernel: Lustre: 20729:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 850 previous similar messages Aug 09 10:57:35 fir-md1-s1 kernel: Lustre: 23560:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:18s); client may timeout. req@ffff8f07c0f13000 x1631568410381744/t0(0) o101->bbd3b988-dccb-0391-b9e0-c34c4c36894a@10.9.105.13@o2ib4:17/0 lens 576/0 e 0 to 0 dl 1565373437 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 09 10:57:35 fir-md1-s1 kernel: Lustre: 23634:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:18s); client may timeout. req@ffff8f22d8708900 x1634130967967536/t0(0) o101->a7aad8e9-6055-f520-5dcf-5ea6b8e2ae73@10.9.104.52@o2ib4:17/0 lens 576/0 e 0 to 0 dl 1565373437 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 09 10:57:35 fir-md1-s1 kernel: LustreError: 23632:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.50@o2ib4: deadline 30:1s ago req@ffff8f0eeb477b00 x1631628792504304/t0(0) o101->d2cbc696-5cd3-e3f4-6274-69bc361797a1@10.9.102.50@o2ib4:4/0 lens 576/0 e 0 to 0 dl 1565373454 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 09 10:57:35 fir-md1-s1 kernel: LustreError: 21417:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.50@o2ib4: deadline 30:1s ago req@ffff8f099aaee600 x1631628792504240/t0(0) o101->d2cbc696-5cd3-e3f4-6274-69bc361797a1@10.9.102.50@o2ib4:4/0 lens 576/0 e 0 to 0 dl 1565373454 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 09 10:57:35 fir-md1-s1 kernel: LustreError: 23632:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Aug 09 10:57:35 fir-md1-s1 kernel: LustreError: 21417:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Aug 09 10:57:35 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 10:57:35 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 09 10:57:35 fir-md1-s1 kernel: Lustre: 23634:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 528 previous similar messages Aug 09 10:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 10:59:20 fir-md1-s1 kernel: Lustre: Skipped 786 previous similar messages Aug 09 11:00:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 11:00:16 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 09 11:05:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 11:05:11 fir-md1-s1 kernel: Lustre: Skipped 598 previous similar messages Aug 09 11:08:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 11:08:14 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 11:09:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 11:09:21 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 11:10:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 11:10:36 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 09 11:15:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 11:15:28 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 09 11:19:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 11:19:38 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 11:20:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 11:20:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 09 11:22:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 11:22:35 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 11:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 11:25:34 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 09 11:30:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 11:30:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 11:31:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 11:31:18 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 11:33:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 11:33:16 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 11:35:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 11:35:35 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 09 11:41:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 11:41:15 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 11:41:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 11:41:24 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 11:45:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 11:45:36 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 11:46:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 11:46:56 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 11:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 11:51:26 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 11:52:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 11:52:19 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 09 11:55:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 11:55:44 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Aug 09 12:00:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 12:00:03 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 12:01:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 12:01:30 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 12:02:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 12:02:35 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 12:05:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 12:05:53 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 09 12:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 12:11:43 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 09 12:15:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 12:15:32 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 12:15:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 12:15:56 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 09 12:22:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 12:22:02 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 12:25:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 12:25:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 12:25:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 12:25:39 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 12:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 12:26:14 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 09 12:32:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 12:32:03 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 12:35:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 12:36:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 12:36:25 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 09 12:36:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 12:36:56 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 09 12:42:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 12:42:09 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 12:46:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 12:46:25 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 09 12:47:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 12:47:33 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 12:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 12:52:13 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 12:52:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 12:52:58 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 12:53:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 12:56:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 12:56:27 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 09 12:57:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:00:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 13:00:08 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 09 13:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 13:02:26 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 13:06:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 13:06:32 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 09 13:08:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:08:30 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 13:10:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 13:10:49 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 13:12:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 13:12:37 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 13:13:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:13:50 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 13:16:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 13:16:44 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 09 13:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2a3bbb7800, cur 1565381818 expire 1565381668 last 1565381591 Aug 09 13:16:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 09 13:20:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 13:20:50 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 13:22:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 13:22:50 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 09 13:24:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:24:10 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 13:26:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 13:26:46 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 09 13:31:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 13:31:43 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 13:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 13:32:51 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 13:36:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 13:36:49 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 09 13:42:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 13:42:52 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 13:43:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 13:43:07 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 13:45:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:45:50 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 13:47:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 13:47:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:47:20 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 09 13:51:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 13:53:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 13:53:00 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 13:53:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 13:53:08 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 13:57:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 13:57:28 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 09 14:00:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 14:03:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 14:03:08 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 14:03:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 14:03:17 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 14:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 14:07:28 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 14:08:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 14:08:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Aug 09 14:12:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 14:13:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 14:13:33 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 09 14:14:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 14:14:13 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 09 14:17:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 14:17:45 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 09 14:22:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 14:22:16 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 14:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 14:23:38 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 14:25:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 14:25:00 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 09 14:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 14:28:09 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 09 14:33:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 14:33:56 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 14:35:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 14:35:15 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Aug 09 14:37:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 14:37:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 14:38:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 14:38:19 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 09 14:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 14:44:08 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 09 14:45:56 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387149/real 1565387149] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387156 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 14:45:56 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 09 14:46:04 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387156/real 1565387156] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387163 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 14:46:14 fir-md1-s1 kernel: Lustre: 21672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2c70095700 x1635104137079856/t0(0) o101->7933cca6-376e-2621-120f-991576fc8851@10.9.109.52@o2ib4:19/0 lens 480/568 e 0 to 0 dl 1565387179 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 14:46:14 fir-md1-s1 kernel: Lustre: 21672:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 209 previous similar messages Aug 09 14:46:18 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387171/real 1565387171] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387178 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 14:46:18 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 14:46:39 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387192/real 1565387192] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387199 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 14:46:39 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 09 14:47:14 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387227/real 1565387227] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387234 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 14:47:14 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 09 14:47:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 09 14:47:31 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 09 14:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 14:48:22 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 09 14:48:24 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387297/real 1565387297] req@ffff8f38fca94b00 x1636760405729792/t0(0) o106->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565387304 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 14:48:24 fir-md1-s1 kernel: Lustre: 23741:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 09 14:48:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 02249c69-e1bd-6d39-9214-362a9f883626 (at 10.9.103.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17f3754c00, cur 1565387324 expire 1565387174 last 1565387097 Aug 09 14:48:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 02249c69-e1bd-6d39-9214-362a9f883626 (at 10.9.103.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f38fe351000, cur 1565387334 expire 1565387184 last 1565387107 Aug 09 14:48:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 14:48:54 fir-md1-s1 kernel: LustreError: 21667:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f42c6579b00 x1636760410166704/t0(0) o104->fir-MDT0000@10.9.103.34@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 09 14:49:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 14:50:37 fir-md1-s1 kernel: Lustre: 23572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f077f55bc00 x1641226988288976/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:12/0 lens 576/3264 e 0 to 0 dl 1565387442 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 14:50:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2875785e80/0x5d9ee6c23d48d90e lrc: 3/0,0 mode: PR/PR res: [0x200029fb4:0x367:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.30.23@o2ib6 remote: 0xe713fe56191a8d6 expref: 1233 pid: 20734 timeout: 4502501 lvb_type: 0 Aug 09 14:50:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 14:50:49 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 14:54:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 14:54:35 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 09 14:56:35 fir-md1-s1 kernel: Lustre: 24587:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565387788/real 1565387788] req@ffff8f34a36e6f00 x1636760421052464/t0(0) o106->fir-MDT0002@10.8.30.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565387795 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 14:56:35 fir-md1-s1 kernel: Lustre: 24587:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 09 14:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 14:57:33 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 09 14:58:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 14:58:33 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 09 15:01:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 15:01:06 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 09 15:04:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 15:04:42 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 15:07:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 15:07:34 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 09 15:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 15:08:36 fir-md1-s1 kernel: Lustre: Skipped 104 previous similar messages Aug 09 15:15:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 15:15:09 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 15:17:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 15:17:51 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 09 15:18:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 15:18:47 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 09 15:23:56 fir-md1-s1 kernel: Lustre: 20460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565389429/real 1565389429] req@ffff8f22e622b000 x1636760461671808/t0(0) o106->fir-MDT0000@10.8.12.12@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565389436 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 15:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 15:25:09 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 15:26:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 15:26:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 15:29:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 15:29:04 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 09 15:29:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 15:29:35 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 15:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 15:35:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 15:36:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 15:36:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 15:39:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 15:39:05 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 09 15:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1aa487a000, cur 1565390355 expire 1565390205 last 1565390128 Aug 09 15:40:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 15:40:51 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 15:41:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 15:46:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 15:46:16 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 15:49:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 15:49:08 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 09 15:49:29 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 09 15:51:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 15:51:50 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 15:55:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 15:56:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 15:56:29 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 15:59:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 15:59:28 fir-md1-s1 kernel: Lustre: Skipped 91 previous similar messages Aug 09 16:01:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 16:01:54 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 16:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 16:06:38 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 16:09:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 16:09:32 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 09 16:12:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 16:12:02 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 09 16:16:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 16:16:39 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 16:19:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 16:19:33 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 16:22:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 16:22:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 16:23:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 16:23:48 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Aug 09 16:26:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 16:26:56 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 16:28:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 16:28:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 16:29:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 16:29:47 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 09 16:32:23 fir-md1-s1 kernel: Lustre: 23733:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565393535/real 1565393535] req@ffff8f293ba56300 x1636760568385264/t0(0) o104->fir-MDT0000@10.8.10.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565393542 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 16:32:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 16:32:46 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 16:37:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 16:37:07 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 16:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 16:39:54 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 16:43:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 16:43:21 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 16:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 16:47:17 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 09 16:50:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 16:50:05 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 09 16:53:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 16:53:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 16:57:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 16:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 16:57:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 17:00:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 17:00:16 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 09 17:05:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 17:05:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 17:05:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:07:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 17:07:38 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 17:10:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 17:10:16 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 17:13:12 fir-md1-s1 kernel: Lustre: 23695:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565395985/real 1565395985] req@ffff8f2cb466e600 x1636760629995840/t0(0) o104->fir-MDT0000@10.8.30.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565395992 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 17:15:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 17:15:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:15:34 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 17:15:34 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 17:18:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:18:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 09 17:18:15 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 17:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 17:20:26 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 09 17:25:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 17:25:56 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 17:27:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:28:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 17:28:33 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 17:30:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 17:30:26 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 09 17:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f6253769-e003-83f0-f125-81786f494222 (at 10.8.23.29@o2ib6) in 172 seconds. I think it's dead, and I am evicting it. exp ffff8f42bcb5e800, cur 1565397162 expire 1565397012 last 1565396990 Aug 09 17:33:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6253769-e003-83f0-f125-81786f494222 (at 10.8.23.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee307ec00, cur 1565397217 expire 1565397067 last 1565396990 Aug 09 17:35:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 17:35:59 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 17:37:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:37:39 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 17:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 17:39:09 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 17:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 17:40:42 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 17:46:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 17:46:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 17:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 17:49:11 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 09 17:50:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 17:50:44 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 09 17:56:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 17:56:22 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 09 17:58:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 17:59:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 17:59:15 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 18:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 18:00:52 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 09 18:06:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 09 18:06:22 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 09 18:07:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 18:09:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 18:09:45 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 18:11:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 18:11:06 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 18:16:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 18:16:49 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 09 18:19:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 18:20:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 18:20:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 18:21:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 09 18:21:17 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Aug 09 18:27:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 18:27:50 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 18:29:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 18:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 18:31:20 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 18:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 18:31:20 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 09 18:38:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 18:38:53 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 09 18:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 18:41:30 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 09 18:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 18:41:30 fir-md1-s1 kernel: Lustre: Skipped 93 previous similar messages Aug 09 18:43:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3518253000, cur 1565401436 expire 1565401286 last 1565401209 Aug 09 18:43:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 18:44:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 18:49:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 18:49:18 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 18:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 18:52:10 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 18:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 18:52:10 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 09 18:55:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 18:55:00 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 18:59:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 18:59:24 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 09 19:02:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 19:02:16 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 09 19:02:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 19:02:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 09 19:08:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e17663800, cur 1565402925 expire 1565402775 last 1565402698 Aug 09 19:10:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 19:10:18 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 19:13:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 19:13:19 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 19:13:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 19:13:19 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 09 19:13:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 19:13:44 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 19:20:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 19:20:32 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 09 19:24:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 19:24:18 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 09 19:24:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 19:24:36 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 19:31:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 19:31:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 19:31:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 19:31:52 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages Aug 09 19:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 19:34:25 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Aug 09 19:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 19:34:58 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 19:42:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 19:42:07 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 09 19:46:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 19:46:08 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 09 19:46:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 19:46:43 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 09 19:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 19:47:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 19:52:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 19:52:13 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 09 19:56:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 19:56:08 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 09 19:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 19:58:25 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 09 19:58:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 19:58:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 20:02:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 20:02:18 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 09 20:02:18 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406131/real 1565406131] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406138 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 09 20:02:25 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406138/real 1565406138] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406145 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:02:26 fir-md1-s1 kernel: Lustre: 97649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1a22edf200 x1631597038646160/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:1/0 lens 480/568 e 1 to 0 dl 1565406151 ref 2 fl Interpret:/0/0 rc 0/0 Aug 09 20:02:26 fir-md1-s1 kernel: Lustre: 97649:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 09 20:02:32 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406145/real 1565406145] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406152 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:02:39 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406152/real 1565406152] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406159 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:02:46 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406159/real 1565406159] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406166 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:03:00 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406173/real 1565406173] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406180 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:03:00 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 09 20:03:21 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406194/real 1565406194] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406201 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:03:21 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 09 20:03:56 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406229/real 1565406229] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406236 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:03:56 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 09 20:05:06 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565406299/real 1565406299] req@ffff8f19823e3f00 x1636760805271120/t0(0) o106->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565406306 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 09 20:05:06 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 09 20:05:31 fir-md1-s1 kernel: LNet: Service thread pid 22283 was inactive for 200.57s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 09 20:05:31 fir-md1-s1 kernel: Pid: 22283, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 09 20:05:31 fir-md1-s1 kernel: Call Trace: Aug 09 20:05:31 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 09 20:05:32 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 09 20:05:32 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 09 20:05:32 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 09 20:05:32 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 09 20:05:32 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 09 20:05:32 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 09 20:05:32 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 09 20:05:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565406332.22283 Aug 09 20:05:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 824d5035-68db-e5f9-7a5c-0210e280b617 (at 10.8.27.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ceaa1c800, cur 1565406334 expire 1565406184 last 1565406107 Aug 09 20:05:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 824d5035-68db-e5f9-7a5c-0210e280b617 (at 10.8.27.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1de5880000, cur 1565406338 expire 1565406188 last 1565406111 Aug 09 20:05:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 20:05:38 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:187s); client may timeout. req@ffff8f1a22edf200 x1631597038646160/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:1/0 lens 480/536 e 1 to 0 dl 1565406151 ref 1 fl Complete:/0/0 rc 301/301 Aug 09 20:05:38 fir-md1-s1 kernel: LustreError: 20734:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2d2b20ec00 x1636760807869488/t0(0) o104->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 09 20:05:38 fir-md1-s1 kernel: LNet: Service thread pid 22283 completed after 206.69s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 09 20:05:39 fir-md1-s1 kernel: LustreError: 20731:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1a5e8c9b00 x1636760807884192/t0(0) o104->fir-MDT0000@10.8.27.10@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 09 20:06:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:06:19 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 09 20:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 20:08:51 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 20:13:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 20:13:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 09 20:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:16:30 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Aug 09 20:16:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 20:16:55 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 20:19:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 20:19:12 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 20:26:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 20:26:35 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 20:26:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:26:35 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 09 20:29:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 20:29:31 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 20:31:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 20:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:37:03 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Aug 09 20:37:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 20:37:59 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 09 20:39:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 20:39:57 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 20:45:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 20:47:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 20:47:09 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 09 20:50:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 20:50:23 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 20:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 20:50:25 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 20:55:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 20:55:52 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages Aug 09 20:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 20:57:19 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Aug 09 21:00:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 21:00:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 21:06:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:06:12 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages Aug 09 21:06:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 21:06:53 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 09 21:07:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:07:21 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 09 21:12:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 21:12:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 21:17:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:17:42 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 09 21:17:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:17:42 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 09 21:21:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 21:22:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 21:22:31 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 21:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:27:48 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 09 21:27:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:27:50 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 09 21:33:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 21:33:28 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 21:36:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 21:38:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:38:06 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 09 21:38:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:38:06 fir-md1-s1 kernel: Lustre: Skipped 95 previous similar messages Aug 09 21:43:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 21:43:36 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 09 21:48:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:48:11 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 09 21:48:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:48:11 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 09 21:52:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 21:54:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 21:54:18 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 21:58:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 21:58:18 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 09 21:58:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 21:58:18 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Aug 09 22:04:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 22:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 22:04:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 09 22:08:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 22:08:29 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 22:11:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 22:11:12 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 09 22:14:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 22:14:55 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 09 22:15:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 22:15:44 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 22:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 09 22:18:37 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 09 22:22:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 09 22:22:18 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 09 22:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 22:25:34 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 22:28:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 22:28:39 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 09 22:32:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 22:32:26 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 09 22:36:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 22:36:30 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 09 22:38:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 22:38:44 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 09 22:42:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 22:42:32 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 09 22:44:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 22:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 22:47:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 22:49:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 22:49:11 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Aug 09 22:51:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 22:52:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 22:52:40 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 09 22:58:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 09 22:58:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 09 22:58:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 22:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 22:59:20 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 09 23:03:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 23:03:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 23:05:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 23:06:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ef5ea156-ac74-87c0-bd6c-5f4354bb2230 (at 10.8.30.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e7f6d7c00, cur 1565417165 expire 1565417015 last 1565416938 Aug 09 23:09:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 23:09:32 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 09 23:09:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 23:09:33 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 09 23:13:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 23:13:22 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 09 23:19:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 23:19:37 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages Aug 09 23:19:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 23:19:37 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 09 23:23:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 23:23:35 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 09 23:29:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 09 23:29:55 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 09 23:29:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 09 23:29:55 fir-md1-s1 kernel: Lustre: Skipped 97 previous similar messages Aug 09 23:33:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 23:33:36 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 09 23:36:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 23:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 09 23:40:02 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 09 23:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 09 23:40:02 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 09 23:44:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 23:44:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 09 23:48:05 fir-md1-s1 kernel: mlx5_0:dump_cqe:286:(pid 20189): dump error cqe Aug 09 23:48:05 fir-md1-s1 kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Aug 09 23:48:05 fir-md1-s1 kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Aug 09 23:48:05 fir-md1-s1 kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Aug 09 23:48:05 fir-md1-s1 kernel: 00000030: 00 00 00 00 00 00 89 14 0a 00 02 31 63 ef 0b d3 Aug 09 23:48:05 fir-md1-s1 kernel: Lustre: 20207:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1565419685/real 1565419685] req@ffff8f12b7bd9200 x1636760935139840/t0(0) o41->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1565419716 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Aug 09 23:48:05 fir-md1-s1 kernel: Lustre: fir-MDT0001-osp-MDT0002: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Aug 09 23:48:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 23:48:05 fir-md1-s1 kernel: Lustre: 20207:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Aug 09 23:48:11 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 1 seconds Aug 09 23:48:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 09 23:48:11 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 09 23:48:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:48:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:48:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.113.9@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 23:49:07 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:49:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:49:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:50:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 09 23:50:28 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 09 23:50:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 09 23:50:28 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 09 23:50:47 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:50:47 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:51:38 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 1 seconds Aug 09 23:51:38 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:51:44 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 323e9462-2806-288b-427b-09b4875db405 (at 10.0.10.52@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f153524fc00, cur 1565419904 expire 1565419754 last 1565419677 Aug 09 23:51:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 09 23:51:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fir-MDT0001-mdtlov_UUID (at 10.0.10.52@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533287400, cur 1565419906 expire 1565419756 last 1565419679 Aug 09 23:51:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fir-MDT0003-mdtlov_UUID (at 10.0.10.52@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45362dfc00, cur 1565419907 expire 1565419757 last 1565419680 Aug 09 23:51:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 09 23:52:27 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:52:27 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:53:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:53:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages Aug 09 23:53:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.109.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 09 23:53:29 fir-md1-s1 kernel: LustreError: Skipped 14467 previous similar messages Aug 09 23:54:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 09 23:54:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 7 previous similar messages Aug 09 23:57:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 1 seconds Aug 09 23:57:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 11 previous similar messages Aug 09 23:58:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 09 23:58:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 00:00:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 00:00:48 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 00:00:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 00:00:48 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 00:02:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:02:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 23 previous similar messages Aug 10 00:03:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.104.8@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:03:30 fir-md1-s1 kernel: LustreError: Skipped 20730 previous similar messages Aug 10 00:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 347bb271-3678-9ce7-b4c0-741140599723 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1784140800, cur 1565420821 expire 1565420671 last 1565420594 Aug 10 00:07:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 00:08:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 00:08:23 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 10 00:10:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 00:10:59 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages Aug 10 00:11:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 00:11:24 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 00:11:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:11:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 43 previous similar messages Aug 10 00:13:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.21@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:13:30 fir-md1-s1 kernel: LustreError: Skipped 23081 previous similar messages Aug 10 00:19:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 00:19:38 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 10 00:21:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 10 00:21:38 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Aug 10 00:21:43 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:21:43 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 00:22:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 00:22:16 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 00:23:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.30@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:23:30 fir-md1-s1 kernel: LustreError: Skipped 20792 previous similar messages Aug 10 00:30:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 00:30:05 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 00:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 00:31:40 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 10 00:31:45 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:31:45 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 00:32:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 00:32:26 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 10 00:33:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.13.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:33:30 fir-md1-s1 kernel: LustreError: Skipped 22293 previous similar messages Aug 10 00:41:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:41:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 00:43:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) reconnecting Aug 10 00:43:03 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 00:43:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 10 00:43:03 fir-md1-s1 kernel: Lustre: Skipped 78 previous similar messages Aug 10 00:43:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.19@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:43:30 fir-md1-s1 kernel: LustreError: Skipped 20540 previous similar messages Aug 10 00:44:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 00:44:52 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 10 00:51:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 00:51:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 00:53:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 10 00:53:09 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 10 00:53:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.57@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 00:53:30 fir-md1-s1 kernel: LustreError: Skipped 22721 previous similar messages Aug 10 00:53:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 00:53:48 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 00:54:41 fir-md1-s1 kernel: Lustre: 23649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a6da70900 x1639332118403744/t0(0) o36->39e76845-4976-21c9-38bb-bb738759d72c@10.9.0.64@o2ib4:16/0 lens 608/2888 e 1 to 0 dl 1565423686 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:54:51 fir-md1-s1 kernel: Lustre: 20457:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a7224b600 x1631644024213488/t0(0) o36->339627b1-f298-e293-3cc1-dc6c48f43358@10.9.104.56@o2ib4:26/0 lens 552/2888 e 1 to 0 dl 1565423696 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 00:54:53 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 10 00:54:56 fir-md1-s1 kernel: Lustre: 21677:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2cba977b00 x1631559519779648/t0(0) o36->f7d39296-2681-999e-c9dd-38a3ef8bf584@10.9.106.15@o2ib4:1/0 lens 552/2888 e 1 to 0 dl 1565423701 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:55:11 fir-md1-s1 kernel: Lustre: 97670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1601213f00 x1636458739303056/t0(0) o36->76ed0af9-aa81-fdfa-a462-54cb6855d00e@10.9.106.20@o2ib4:16/0 lens 552/2888 e 0 to 0 dl 1565423716 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:55:15 fir-md1-s1 kernel: Lustre: 20720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2431f56600 x1636433857383632/t0(0) o36->fc9cbc0d-41e6-18a0-ddfe-91c390cc7652@10.9.108.7@o2ib4:20/0 lens 544/2888 e 1 to 0 dl 1565423720 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:55:39 fir-md1-s1 kernel: Lustre: 21372:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3ea162f500 x1638239923982160/t0(0) o36->d1d37c59-1aef-cee4-6611-3ad516d77ba1@10.9.108.39@o2ib4:14/0 lens 752/2888 e 0 to 0 dl 1565423744 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 00:56:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 00:56:06 fir-md1-s1 kernel: LustreError: 23560:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423676, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f10abf30b40/0x5d9ee6c2ffc34257 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2ffc3425e expref: -99 pid: 23560 timeout: 0 lvb_type: 0 Aug 10 00:56:11 fir-md1-s1 kernel: LustreError: 50580:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423681, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2840eac5c0/0x5d9ee6c2ffc60c73 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2ffc60c7a expref: -99 pid: 50580 timeout: 0 lvb_type: 0 Aug 10 00:56:12 fir-md1-s1 kernel: Lustre: 21368:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f08ea3ace00 x1631556572616832/t0(0) o36->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:17/0 lens 568/2888 e 0 to 0 dl 1565423777 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:56:16 fir-md1-s1 kernel: LustreError: 97669:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423686, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f17e7981200/0x5d9ee6c2ffc8cd89 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 9 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2ffc8cd90 expref: -99 pid: 97669 timeout: 0 lvb_type: 0 Aug 10 00:56:30 fir-md1-s1 kernel: LustreError: 97638:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423700, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1dc06b2f40/0x5d9ee6c2ffcf8035 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 9 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2ffcf803c expref: -99 pid: 97638 timeout: 0 lvb_type: 0 Aug 10 00:56:44 fir-md1-s1 kernel: LustreError: 23717:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423714, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1da6c41f80/0x5d9ee6c2ffd659e6 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23717 timeout: 0 lvb_type: 0 Aug 10 00:56:55 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20f5061200 x1634138338083936/t0(0) o36->190e8c90-938d-b7f6-84df-7662b8e78e53@10.9.107.71@o2ib4:0/0 lens 592/2888 e 0 to 0 dl 1565423820 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:56:55 fir-md1-s1 kernel: Lustre: 22286:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 10 00:57:17 fir-md1-s1 kernel: LustreError: 23636:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423747, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1b5045aac0/0x5d9ee6c2ffe63584 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23636 timeout: 0 lvb_type: 0 Aug 10 00:57:17 fir-md1-s1 kernel: LustreError: 20457:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423747, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2041677500/0x5d9ee6c2ffe69d11 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 10 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2ffe69d18 expref: -99 pid: 20457 timeout: 0 lvb_type: 0 Aug 10 00:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 00:57:46 fir-md1-s1 kernel: LNet: Service thread pid 23562 was inactive for 200.59s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 00:57:46 fir-md1-s1 kernel: Pid: 23562, comm: mdt00_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 00:57:46 fir-md1-s1 kernel: Call Trace: Aug 10 00:57:46 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 10 00:57:46 fir-md1-s1 kernel: [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] Aug 10 00:57:46 fir-md1-s1 kernel: [] osp_remote_sync+0xd3/0x200 [osp] Aug 10 00:57:46 fir-md1-s1 kernel: [] osp_attr_get+0x463/0x730 [osp] Aug 10 00:57:46 fir-md1-s1 kernel: [] osp_object_init+0x16d/0x2d0 [osp] Aug 10 00:57:46 fir-md1-s1 kernel: [] lu_object_alloc+0xe5/0x320 [obdclass] Aug 10 00:57:47 fir-md1-s1 kernel: [] lu_object_find_at+0x76/0x280 [obdclass] Aug 10 00:57:47 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdd_is_parent+0xa2/0x1a0 [mdd] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdd_is_subdir+0x204/0x240 [mdd] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdt_reint_rename+0x37e/0x2b90 [mdt] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 00:57:47 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 00:57:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 00:57:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 00:57:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 00:57:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 00:57:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 00:57:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 00:57:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423867.23562 Aug 10 00:57:56 fir-md1-s1 kernel: LNet: Service thread pid 23560 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 00:57:56 fir-md1-s1 kernel: Pid: 23560, comm: mdt00_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 00:57:56 fir-md1-s1 kernel: Call Trace: Aug 10 00:57:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 00:57:56 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 00:57:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 00:57:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 00:57:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 00:57:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 00:57:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423876.23560 Aug 10 00:58:00 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423790, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1e63b6e780/0x5d9ee6c2fffa6719 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 10 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c2fffa6720 expref: -99 pid: 97648 timeout: 0 lvb_type: 0 Aug 10 00:58:00 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 10 00:58:02 fir-md1-s1 kernel: LNet: Service thread pid 50580 was inactive for 200.61s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 00:58:02 fir-md1-s1 kernel: Pid: 50580, comm: mdt02_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 00:58:02 fir-md1-s1 kernel: Call Trace: Aug 10 00:58:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 00:58:02 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 00:58:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 00:58:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 00:58:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 00:58:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 00:58:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423882.50580 Aug 10 00:58:07 fir-md1-s1 kernel: LNet: Service thread pid 97669 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 00:58:07 fir-md1-s1 kernel: Pid: 97669, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 00:58:07 fir-md1-s1 kernel: Call Trace: Aug 10 00:58:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 00:58:07 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 00:58:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 00:58:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 00:58:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 00:58:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 00:58:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423887.97669 Aug 10 00:58:20 fir-md1-s1 kernel: LNet: Service thread pid 97638 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 00:58:20 fir-md1-s1 kernel: Pid: 97638, comm: mdt01_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 00:58:20 fir-md1-s1 kernel: Call Trace: Aug 10 00:58:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 00:58:20 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 00:58:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 00:58:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 00:58:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 00:58:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 00:58:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423900.97638 Aug 10 00:58:34 fir-md1-s1 kernel: LNet: Service thread pid 23717 was inactive for 200.33s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 00:58:34 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 00:58:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423914.23717 Aug 10 00:59:07 fir-md1-s1 kernel: LNet: Service thread pid 23636 was inactive for 200.57s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 00:59:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423947.23636 Aug 10 00:59:12 fir-md1-s1 kernel: LNet: Service thread pid 22284 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 00:59:12 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 00:59:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423952.22284 Aug 10 00:59:28 fir-md1-s1 kernel: Lustre: 27321:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f132b94cb00 x1631581616383600/t0(0) o36->7c13660c-b743-3f11-23de-3221a2e02958@10.9.106.40@o2ib4:3/0 lens 560/2888 e 0 to 0 dl 1565423973 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 00:59:33 fir-md1-s1 kernel: LNet: Service thread pid 21003 was inactive for 200.31s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 00:59:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423973.21003 Aug 10 00:59:50 fir-md1-s1 kernel: LNet: Service thread pid 97648 was inactive for 200.30s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 00:59:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565423990.97648 Aug 10 01:00:33 fir-md1-s1 kernel: LustreError: 27320:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565423943, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ef7d05100/0x5d9ee6c30040f6b5 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 15 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 27320 timeout: 0 lvb_type: 0 Aug 10 01:01:33 fir-md1-s1 kernel: LustreError: 23582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424003, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3234fdf980/0x5d9ee6c3005ba9ef lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 17 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23582 timeout: 0 lvb_type: 0 Aug 10 01:01:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:01:57 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:01:58 fir-md1-s1 kernel: LustreError: 23555:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424028, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0e0a9de300/0x5d9ee6c30066e59f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 18 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23555 timeout: 0 lvb_type: 0 Aug 10 01:02:12 fir-md1-s1 kernel: Lustre: 10147:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f4021e04200 x1634181808664784/t0(0) o36->4072b1d6-edd3-180c-dea8-8ff7b460e07f@10.9.109.70@o2ib4:17/0 lens 560/2888 e 0 to 0 dl 1565424137 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:02:12 fir-md1-s1 kernel: Lustre: 10147:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 10 01:02:23 fir-md1-s1 kernel: LNet: Service thread pid 27320 was inactive for 200.22s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:02:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424143.27320 Aug 10 01:02:28 fir-md1-s1 kernel: LustreError: 23749:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424058, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f33e2a7c140/0x5d9ee6c300744ebd lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 18 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23749 timeout: 0 lvb_type: 0 Aug 10 01:02:33 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 01:02:33 fir-md1-s1 kernel: LustreError: 20720:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424063, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2520f51b00/0x5d9ee6c30076526a lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 11 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c300765271 expref: -99 pid: 20720 timeout: 0 lvb_type: 0 Aug 10 01:03:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6855e9e0-65ba-17ef-48f8-cf674cb5aba9 (at 10.9.106.14@o2ib4) Aug 10 01:03:09 fir-md1-s1 kernel: Lustre: Skipped 303 previous similar messages Aug 10 01:03:17 fir-md1-s1 kernel: LustreError: 50576:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424107, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2c1ac61680/0x5d9ee6c3008ac536 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 18 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50576 timeout: 0 lvb_type: 0 Aug 10 01:03:23 fir-md1-s1 kernel: LNet: Service thread pid 23582 was inactive for 200.66s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:03:23 fir-md1-s1 kernel: Pid: 23582, comm: mdt03_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:03:23 fir-md1-s1 kernel: Call Trace: Aug 10 01:03:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:03:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:03:23 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:03:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:03:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:03:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:03:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:03:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:03:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:03:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:03:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:03:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:03:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:03:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424203.23582 Aug 10 01:03:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.109.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:03:30 fir-md1-s1 kernel: LustreError: Skipped 20552 previous similar messages Aug 10 01:03:48 fir-md1-s1 kernel: LNet: Service thread pid 23555 was inactive for 200.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:03:48 fir-md1-s1 kernel: Pid: 23555, comm: mdt00_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:03:48 fir-md1-s1 kernel: Call Trace: Aug 10 01:03:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:03:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:03:48 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:03:48 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:03:48 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:03:48 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:03:48 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:03:48 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:03:48 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:03:48 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:03:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:03:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:03:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:03:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424228.23555 Aug 10 01:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4072b1d6-edd3-180c-dea8-8ff7b460e07f (at 10.9.109.70@o2ib4) reconnecting Aug 10 01:03:52 fir-md1-s1 kernel: Lustre: Skipped 266 previous similar messages Aug 10 01:04:19 fir-md1-s1 kernel: Pid: 23749, comm: mdt02_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:04:19 fir-md1-s1 kernel: Call Trace: Aug 10 01:04:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:04:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:04:19 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:04:19 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:04:19 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:04:19 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:04:19 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:04:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:04:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:04:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:04:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:04:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:04:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:04:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424259.23749 Aug 10 01:04:23 fir-md1-s1 kernel: LNet: Service thread pid 20720 was inactive for 200.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:04:23 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 01:04:23 fir-md1-s1 kernel: Pid: 20720, comm: mdt01_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:04:23 fir-md1-s1 kernel: Call Trace: Aug 10 01:04:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 01:04:23 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:04:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:04:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:04:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:04:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:04:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424263.20720 Aug 10 01:05:08 fir-md1-s1 kernel: Pid: 50576, comm: mdt03_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:05:08 fir-md1-s1 kernel: Call Trace: Aug 10 01:05:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:05:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:05:08 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:05:08 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:05:08 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:05:08 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:05:08 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:05:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:05:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:05:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:05:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:05:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:05:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:05:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424308.50576 Aug 10 01:06:29 fir-md1-s1 kernel: Lustre: 10147:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3d6271ef00 x1631570617228496/t0(0) o36->ad682b46-15bf-0f3a-4bf3-bd0a52dcefe5@10.9.105.8@o2ib4:4/0 lens 560/2888 e 0 to 0 dl 1565424394 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:06:29 fir-md1-s1 kernel: Lustre: 10147:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 01:07:23 fir-md1-s1 kernel: LustreError: 21415:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424353, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3505385a00/0x5d9ee6c300f66fe7 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 20 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21415 timeout: 0 lvb_type: 0 Aug 10 01:09:14 fir-md1-s1 kernel: LNet: Service thread pid 21415 was inactive for 200.57s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:09:14 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 01:09:14 fir-md1-s1 kernel: Pid: 21415, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:09:14 fir-md1-s1 kernel: Call Trace: Aug 10 01:09:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:09:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:09:14 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:09:14 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:09:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:09:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:09:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:09:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:09:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:09:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:09:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:09:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:09:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:09:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424554.21415 Aug 10 01:09:24 fir-md1-s1 kernel: Pid: 27316, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:09:24 fir-md1-s1 kernel: Call Trace: Aug 10 01:09:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:09:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:09:24 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:09:24 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:09:24 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:09:24 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:09:24 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:09:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:09:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:09:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:09:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:09:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:09:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:09:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424564.27316 Aug 10 01:10:23 fir-md1-s1 kernel: LustreError: 23454:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424533, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f17d2fd69c0/0x5d9ee6c301489bb9 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 23 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23454 timeout: 0 lvb_type: 0 Aug 10 01:10:23 fir-md1-s1 kernel: LustreError: 23454:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 01:12:13 fir-md1-s1 kernel: LNet: Service thread pid 23454 was inactive for 200.09s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:12:13 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 01:12:13 fir-md1-s1 kernel: Pid: 23454, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:12:13 fir-md1-s1 kernel: Call Trace: Aug 10 01:12:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:12:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:12:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:12:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424733.23454 Aug 10 01:12:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:12:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:12:13 fir-md1-s1 kernel: Pid: 21127, comm: mdt02_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:12:13 fir-md1-s1 kernel: Call Trace: Aug 10 01:12:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:12:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:12:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:12:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:12:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:12:21 fir-md1-s1 kernel: Pid: 21671, comm: mdt02_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:12:21 fir-md1-s1 kernel: Call Trace: Aug 10 01:12:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:12:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:12:21 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:12:21 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:12:21 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:12:21 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:12:21 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:12:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:12:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:12:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:12:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:12:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:12:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:12:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424741.21671 Aug 10 01:12:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 01:12:24 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 10 01:13:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5ba49a0b-6992-5fed-3d51-62bafc05f9db (at 10.9.104.56@o2ib4) Aug 10 01:13:10 fir-md1-s1 kernel: Lustre: Skipped 468 previous similar messages Aug 10 01:13:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:13:30 fir-md1-s1 kernel: LustreError: Skipped 23096 previous similar messages Aug 10 01:13:35 fir-md1-s1 kernel: LustreError: 97651:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565424725, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1d07fb4800/0x5d9ee6c3019cd151 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 29 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97651 timeout: 0 lvb_type: 0 Aug 10 01:13:35 fir-md1-s1 kernel: LustreError: 97651:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 10 01:13:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 339627b1-f298-e293-3cc1-dc6c48f43358 (at 10.9.104.56@o2ib4) reconnecting Aug 10 01:13:52 fir-md1-s1 kernel: Lustre: Skipped 476 previous similar messages Aug 10 01:13:59 fir-md1-s1 kernel: LNet: Service thread pid 21369 was inactive for 200.63s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:13:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424839.21369 Aug 10 01:14:11 fir-md1-s1 kernel: LNet: Service thread pid 24582 was inactive for 200.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:14:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424851.24582 Aug 10 01:15:26 fir-md1-s1 kernel: Pid: 97651, comm: mdt01_090 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:15:26 fir-md1-s1 kernel: Call Trace: Aug 10 01:15:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:15:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:15:26 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:15:26 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:15:26 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:15:26 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:15:26 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:15:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:15:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:15:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:15:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:15:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:15:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:15:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424926.97651 Aug 10 01:15:49 fir-md1-s1 kernel: Pid: 21145, comm: mdt03_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:15:49 fir-md1-s1 kernel: Call Trace: Aug 10 01:15:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:15:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:15:49 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:15:49 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:15:49 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:15:49 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:15:49 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:15:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:15:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:15:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:15:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:15:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:15:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:15:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424949.21145 Aug 10 01:16:14 fir-md1-s1 kernel: Lustre: 20719:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f159f5d6900 x1638809001683824/t0(0) o36->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:19/0 lens 616/2888 e 0 to 0 dl 1565424979 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:16:14 fir-md1-s1 kernel: Lustre: 20719:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Aug 10 01:16:21 fir-md1-s1 kernel: Pid: 20725, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:16:21 fir-md1-s1 kernel: Call Trace: Aug 10 01:16:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:16:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:16:21 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:16:21 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:16:21 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:16:21 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:16:21 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:16:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:16:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:16:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:16:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:16:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:16:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:16:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565424981.20725 Aug 10 01:16:53 fir-md1-s1 kernel: LNet: Service thread pid 23581 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:16:53 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Aug 10 01:16:53 fir-md1-s1 kernel: Pid: 23581, comm: mdt02_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:16:53 fir-md1-s1 kernel: Call Trace: Aug 10 01:16:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:16:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:16:53 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:16:53 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:16:53 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:16:53 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:16:53 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:16:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:16:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:16:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:16:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:16:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:16:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:16:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425013.23581 Aug 10 01:19:10 fir-md1-s1 kernel: Pid: 21455, comm: mdt01_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:19:10 fir-md1-s1 kernel: Call Trace: Aug 10 01:19:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:19:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:19:10 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:19:10 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:19:10 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:19:10 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:19:10 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:19:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:19:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:19:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:19:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:19:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:19:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:19:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425150.21455 Aug 10 01:19:35 fir-md1-s1 kernel: LNet: Service thread pid 23565 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:19:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425175.23565 Aug 10 01:19:43 fir-md1-s1 kernel: LustreError: 23677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565425093, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f19694bb180/0x5d9ee6c3023da464 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 33 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23677 timeout: 0 lvb_type: 0 Aug 10 01:19:43 fir-md1-s1 kernel: LustreError: 23677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 10 01:21:33 fir-md1-s1 kernel: Pid: 23677, comm: mdt03_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:21:33 fir-md1-s1 kernel: Call Trace: Aug 10 01:21:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:21:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:21:33 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:21:33 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:21:33 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:21:33 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:21:33 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:21:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:21:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:21:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:21:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:21:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:21:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:21:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425293.23677 Aug 10 01:21:56 fir-md1-s1 kernel: Pid: 22279, comm: mdt01_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:21:56 fir-md1-s1 kernel: Call Trace: Aug 10 01:21:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:21:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:21:56 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:21:56 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:21:56 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:21:56 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:21:56 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:21:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:21:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:21:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:21:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:21:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:21:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:21:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425317.22279 Aug 10 01:22:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:22:17 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:22:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 01:22:44 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 10 01:23:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to bc55e3ad-5761-b99c-db48-1cedf7af21a5 (at 10.9.104.36@o2ib4) Aug 10 01:23:11 fir-md1-s1 kernel: Lustre: Skipped 667 previous similar messages Aug 10 01:23:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.105.13@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:23:31 fir-md1-s1 kernel: LustreError: Skipped 20915 previous similar messages Aug 10 01:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d594a152-d993-c755-50bf-0f3b806ddc60 (at 10.9.107.22@o2ib4) reconnecting Aug 10 01:23:52 fir-md1-s1 kernel: Lustre: Skipped 672 previous similar messages Aug 10 01:27:13 fir-md1-s1 kernel: Lustre: 23722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f37f17a1b00 x1635091222765504/t0(0) o36->9234d6a3-de0f-63f4-f884-c9cfe5f61af5@10.9.102.6@o2ib4:18/0 lens 560/2888 e 0 to 0 dl 1565425638 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:27:13 fir-md1-s1 kernel: Lustre: 23722:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 10 01:28:18 fir-md1-s1 kernel: LustreError: 23727:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565425608, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2c6f500240/0x5d9ee6c3031d884f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 35 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23727 timeout: 0 lvb_type: 0 Aug 10 01:28:18 fir-md1-s1 kernel: LustreError: 23727:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 01:30:09 fir-md1-s1 kernel: LNet: Service thread pid 23727 was inactive for 200.65s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:30:09 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 10 01:30:09 fir-md1-s1 kernel: Pid: 23727, comm: mdt03_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:30:09 fir-md1-s1 kernel: Call Trace: Aug 10 01:30:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:30:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:30:09 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:30:09 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:30:09 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:30:09 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:30:09 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:30:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:30:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:30:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:30:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:30:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:30:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:30:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425809.23727 Aug 10 01:30:52 fir-md1-s1 kernel: Pid: 21460, comm: mdt01_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:30:52 fir-md1-s1 kernel: Call Trace: Aug 10 01:30:52 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:30:52 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:30:52 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:30:52 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:30:52 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:30:52 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:30:52 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:30:52 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:30:52 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:30:52 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:30:52 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:30:52 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:30:52 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:30:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425852.21460 Aug 10 01:31:51 fir-md1-s1 kernel: Pid: 26253, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:31:51 fir-md1-s1 kernel: Call Trace: Aug 10 01:31:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:31:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:31:51 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:31:51 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:31:51 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:31:51 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:31:51 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:31:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:31:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:31:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:31:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:31:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:31:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:31:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425911.26253 Aug 10 01:32:02 fir-md1-s1 kernel: Pid: 20996, comm: mdt02_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:32:02 fir-md1-s1 kernel: Call Trace: Aug 10 01:32:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:32:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:32:02 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:32:02 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:32:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:32:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:32:02 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:32:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:32:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:32:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:32:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:32:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:32:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:32:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425922.20996 Aug 10 01:32:19 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:32:19 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 01:32:23 fir-md1-s1 kernel: LustreError: 23726:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565425853, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1d7a51c5c0/0x5d9ee6c30384d6fe lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c30384d705 expref: -99 pid: 23726 timeout: 0 lvb_type: 0 Aug 10 01:32:54 fir-md1-s1 kernel: Pid: 20541, comm: mdt00_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:32:54 fir-md1-s1 kernel: Call Trace: Aug 10 01:32:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:32:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:32:54 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:32:54 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:32:54 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:32:54 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:32:54 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:32:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:32:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:32:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:32:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:32:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:32:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:32:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565425974.20541 Aug 10 01:33:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to dc956aa1-2e62-6a58-d1a0-d773f7b71344 (at 10.9.107.12@o2ib4) Aug 10 01:33:11 fir-md1-s1 kernel: Lustre: Skipped 807 previous similar messages Aug 10 01:33:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:33:31 fir-md1-s1 kernel: LustreError: Skipped 22549 previous similar messages Aug 10 01:33:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7d39296-2681-999e-c9dd-38a3ef8bf584 (at 10.9.106.15@o2ib4) reconnecting Aug 10 01:33:54 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Aug 10 01:34:14 fir-md1-s1 kernel: LNet: Service thread pid 23726 was inactive for 200.48s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:34:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426054.23726 Aug 10 01:34:28 fir-md1-s1 kernel: LNet: Service thread pid 21667 was inactive for 200.65s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:34:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426068.21667 Aug 10 01:38:19 fir-md1-s1 kernel: Lustre: 24580:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f32b36a9200 x1635092522992368/t0(0) o36->e7b57212-3a7e-4064-e6ff-77f892effff8@10.9.109.22@o2ib4:24/0 lens 560/2888 e 0 to 0 dl 1565426304 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:38:19 fir-md1-s1 kernel: Lustre: 24580:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 10 01:38:31 fir-md1-s1 kernel: Pid: 97670, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:38:31 fir-md1-s1 kernel: Call Trace: Aug 10 01:38:31 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:38:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:38:31 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:38:31 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:38:31 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:38:31 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:38:31 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:38:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:38:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:38:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:38:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:38:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:38:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:38:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426311.97670 Aug 10 01:38:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 01:38:49 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 10 01:39:24 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565426274, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1fc68a9d40/0x5d9ee6c3043ce865 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 42 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22283 timeout: 0 lvb_type: 0 Aug 10 01:39:24 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Aug 10 01:41:14 fir-md1-s1 kernel: LNet: Service thread pid 22283 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:41:14 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Aug 10 01:41:14 fir-md1-s1 kernel: Pid: 22283, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:41:14 fir-md1-s1 kernel: Call Trace: Aug 10 01:41:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:41:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:41:14 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:41:14 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:41:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:41:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:41:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:41:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:41:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:41:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:41:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:41:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:41:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:41:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426474.22283 Aug 10 01:42:30 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:42:30 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:43:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6558db06-e84b-e314-9758-e1f758d5cd4e (at 10.9.107.14@o2ib4) Aug 10 01:43:12 fir-md1-s1 kernel: Lustre: Skipped 875 previous similar messages Aug 10 01:43:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.27@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:43:31 fir-md1-s1 kernel: LustreError: Skipped 20887 previous similar messages Aug 10 01:43:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4fd3697b-8ac3-d03c-d547-c2a2aae5b292 (at 10.8.28.8@o2ib6) reconnecting Aug 10 01:43:55 fir-md1-s1 kernel: Lustre: Skipped 869 previous similar messages Aug 10 01:45:20 fir-md1-s1 kernel: Pid: 23649, comm: mdt00_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:45:20 fir-md1-s1 kernel: Call Trace: Aug 10 01:45:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:45:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:45:20 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:45:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:45:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:45:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:45:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:45:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:45:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:45:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:45:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:45:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:45:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:45:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426720.23649 Aug 10 01:46:10 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 01:46:10 fir-md1-s1 kernel: LustreError: 23588:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565426680, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f23838f60c0/0x5d9ee6c304ebabe9 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 20 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c304ebabf0 expref: -99 pid: 23588 timeout: 0 lvb_type: 0 Aug 10 01:46:10 fir-md1-s1 kernel: LustreError: 23588:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Aug 10 01:46:18 fir-md1-s1 kernel: Pid: 10502, comm: mdt00_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:46:18 fir-md1-s1 kernel: Call Trace: Aug 10 01:46:18 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:46:18 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:46:18 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:46:18 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:46:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:46:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:46:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:46:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:46:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:46:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:46:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:46:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:46:18 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:46:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426778.10502 Aug 10 01:47:02 fir-md1-s1 kernel: Pid: 23738, comm: mdt02_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:47:02 fir-md1-s1 kernel: Call Trace: Aug 10 01:47:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:47:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:47:02 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:47:02 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:47:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:47:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:47:02 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:47:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:47:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:47:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:47:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:47:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:47:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:47:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426822.23738 Aug 10 01:47:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21be837c00, cur 1565426843 expire 1565426693 last 1565426616 Aug 10 01:47:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 01:48:00 fir-md1-s1 kernel: Pid: 23692, comm: mdt02_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:48:00 fir-md1-s1 kernel: Call Trace: Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 01:48:00 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:48:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:48:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:48:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565426880.23692 Aug 10 01:48:00 fir-md1-s1 kernel: Pid: 26254, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:48:00 fir-md1-s1 kernel: Call Trace: Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 01:48:00 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:48:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:48:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:48:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:48:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:48:00 fir-md1-s1 kernel: LNet: Service thread pid 23588 was inactive for 200.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 10 01:48:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 01:48:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 01:52:04 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f251bedb300 x1631609479197744/t0(0) o36->da2044d0-4d1f-46be-9f3b-250354ced4dc@10.9.106.2@o2ib4:9/0 lens 560/2888 e 1 to 0 dl 1565427129 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 01:52:04 fir-md1-s1 kernel: Lustre: 20731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Aug 10 01:52:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 01:52:32 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 01:53:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Aug 10 01:53:13 fir-md1-s1 kernel: Lustre: Skipped 1099 previous similar messages Aug 10 01:53:19 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565427109, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f20c3b00b40/0x5d9ee6c3059c679e lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 54 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97672 timeout: 0 lvb_type: 0 Aug 10 01:53:19 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 10 01:53:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.104.35@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 01:53:31 fir-md1-s1 kernel: LustreError: Skipped 22055 previous similar messages Aug 10 01:53:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e7b57212-3a7e-4064-e6ff-77f892effff8 (at 10.9.109.22@o2ib4) reconnecting Aug 10 01:53:55 fir-md1-s1 kernel: Lustre: Skipped 1090 previous similar messages Aug 10 01:55:10 fir-md1-s1 kernel: LNet: Service thread pid 97672 was inactive for 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 01:55:10 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Aug 10 01:55:10 fir-md1-s1 kernel: Pid: 97672, comm: mdt01_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:55:10 fir-md1-s1 kernel: Call Trace: Aug 10 01:55:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:55:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:55:10 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:55:10 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:55:10 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:55:10 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:55:10 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:55:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:55:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:55:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:55:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:55:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:55:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:55:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565427310.97672 Aug 10 01:59:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 10 01:59:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 01:59:50 fir-md1-s1 kernel: Pid: 21380, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 01:59:50 fir-md1-s1 kernel: Call Trace: Aug 10 01:59:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 01:59:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 01:59:50 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 01:59:50 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 01:59:50 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 01:59:50 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 01:59:50 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 01:59:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 01:59:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 01:59:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 01:59:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 01:59:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 01:59:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 01:59:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565427590.21380 Aug 10 02:02:33 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:02:33 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 02:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5ba49a0b-6992-5fed-3d51-62bafc05f9db (at 10.9.104.56@o2ib4) Aug 10 02:03:13 fir-md1-s1 kernel: Lustre: Skipped 1152 previous similar messages Aug 10 02:03:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:03:31 fir-md1-s1 kernel: LustreError: Skipped 20789 previous similar messages Aug 10 02:03:45 fir-md1-s1 kernel: Lustre: 21419:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f152ca56900 x1636455237223616/t0(0) o36->569c80f1-e322-40ae-cf23-d3ca8807a6fa@10.9.102.40@o2ib4:20/0 lens 560/2888 e 0 to 0 dl 1565427830 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 02:03:45 fir-md1-s1 kernel: Lustre: 21419:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 02:03:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1789d80e-ff23-2d17-851e-02c315f81c99 (at 10.9.108.36@o2ib4) reconnecting Aug 10 02:03:56 fir-md1-s1 kernel: Lustre: Skipped 1134 previous similar messages Aug 10 02:04:50 fir-md1-s1 kernel: LustreError: 10308:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565427800, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f185a7398c0/0x5d9ee6c306b6570f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 56 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10308 timeout: 0 lvb_type: 0 Aug 10 02:04:50 fir-md1-s1 kernel: LustreError: 10308:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 02:06:41 fir-md1-s1 kernel: LNet: Service thread pid 10308 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 02:06:41 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 02:06:41 fir-md1-s1 kernel: Pid: 10308, comm: mdt00_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 02:06:41 fir-md1-s1 kernel: Call Trace: Aug 10 02:06:41 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 02:06:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 02:06:41 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 02:06:41 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 02:06:41 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 02:06:41 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 02:06:41 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 02:06:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 02:06:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 02:06:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 02:06:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 02:06:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 02:06:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 02:06:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565428001.10308 Aug 10 02:09:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 02:09:06 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 10 02:12:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:12:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 02:13:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a4efabe6-b524-c6c8-7da8-a22ef57bfe19 (at 10.9.105.8@o2ib4) Aug 10 02:13:15 fir-md1-s1 kernel: Lustre: Skipped 1223 previous similar messages Aug 10 02:13:21 fir-md1-s1 kernel: Pid: 97639, comm: mdt01_078 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 02:13:21 fir-md1-s1 kernel: Call Trace: Aug 10 02:13:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 02:13:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 02:13:21 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 02:13:21 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 02:13:21 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 02:13:21 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 02:13:21 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 02:13:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 02:13:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 02:13:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 02:13:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 02:13:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 02:13:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 02:13:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565428401.97639 Aug 10 02:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.104.55@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:13:31 fir-md1-s1 kernel: LustreError: Skipped 22772 previous similar messages Aug 10 02:13:53 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2eed791e00 x1633912115177696/t0(0) o36->c534882d-6030-1b8a-8c54-b433ef117432@10.9.108.56@o2ib4:27/0 lens 568/2888 e 1 to 0 dl 1565428437 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 02:13:53 fir-md1-s1 kernel: Lustre: 23747:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 02:13:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 416fe0e2-ad8c-1554-7392-7df0b48f3b43 (at 10.9.106.14@o2ib4) reconnecting Aug 10 02:13:57 fir-md1-s1 kernel: Lustre: Skipped 1172 previous similar messages Aug 10 02:15:07 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 02:15:07 fir-md1-s1 kernel: LustreError: 23584:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565428417, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2f9d93dc40/0x5d9ee6c307b4210a lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 21 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c307b42111 expref: -99 pid: 23584 timeout: 0 lvb_type: 0 Aug 10 02:16:58 fir-md1-s1 kernel: LNet: Service thread pid 23584 was inactive for 200.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 02:16:58 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 10 02:16:58 fir-md1-s1 kernel: Pid: 23584, comm: mdt02_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 02:16:58 fir-md1-s1 kernel: Call Trace: Aug 10 02:16:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 02:16:58 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 02:16:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 02:16:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 02:16:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 02:16:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 02:16:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565428618.23584 Aug 10 02:20:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 02:20:51 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 10 02:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 02:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a43cb128-d0d5-6a99-1d1d-8c880924d8c9 (at 10.9.107.6@o2ib4) Aug 10 02:23:15 fir-md1-s1 kernel: Lustre: Skipped 1257 previous similar messages Aug 10 02:23:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.112.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:23:31 fir-md1-s1 kernel: LustreError: Skipped 20993 previous similar messages Aug 10 02:23:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 28f2520b-889f-49f6-cf3c-b350acff5281 (at 10.9.115.2@o2ib4) reconnecting Aug 10 02:23:57 fir-md1-s1 kernel: Lustre: Skipped 1212 previous similar messages Aug 10 02:30:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 02:30:57 fir-md1-s1 kernel: Lustre: Skipped 54 previous similar messages Aug 10 02:32:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:32:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 48 previous similar messages Aug 10 02:33:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5e90b32f-f588-dfef-191f-169796896533 (at 10.8.11.36@o2ib6) Aug 10 02:33:15 fir-md1-s1 kernel: Lustre: Skipped 1257 previous similar messages Aug 10 02:33:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.18.2@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:33:31 fir-md1-s1 kernel: LustreError: Skipped 22631 previous similar messages Aug 10 02:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bc3c4d7f-4161-b1f9-2c95-90855fce208a (at 10.9.107.14@o2ib4) reconnecting Aug 10 02:33:57 fir-md1-s1 kernel: Lustre: Skipped 1196 previous similar messages Aug 10 02:41:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 02:41:58 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 10 02:42:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:42:48 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 46 previous similar messages Aug 10 02:42:56 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f32554da400 x1631566263066736/t0(0) o36->754860a0-0fc0-0767-c0e7-b29609b520c7@10.9.106.48@o2ib4:1/0 lens 560/2888 e 1 to 0 dl 1565430181 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 02:43:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Aug 10 02:43:16 fir-md1-s1 kernel: Lustre: Skipped 1249 previous similar messages Aug 10 02:43:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.14@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:43:31 fir-md1-s1 kernel: LustreError: Skipped 21304 previous similar messages Aug 10 02:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7c13660c-b743-3f11-23de-3221a2e02958 (at 10.9.106.40@o2ib4) reconnecting Aug 10 02:43:57 fir-md1-s1 kernel: Lustre: Skipped 1209 previous similar messages Aug 10 02:44:11 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565430161, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f58bb7980/0x5d9ee6c30a52aafa lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 59 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23746 timeout: 0 lvb_type: 0 Aug 10 02:44:11 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 02:46:02 fir-md1-s1 kernel: LNet: Service thread pid 23746 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 02:46:02 fir-md1-s1 kernel: Pid: 23746, comm: mdt02_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 02:46:02 fir-md1-s1 kernel: Call Trace: Aug 10 02:46:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 02:46:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 02:46:02 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 02:46:02 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 02:46:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 02:46:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 02:46:02 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 02:46:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 02:46:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 02:46:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 02:46:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 02:46:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 02:46:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 02:46:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565430362.23746 Aug 10 02:52:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 02:52:09 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 10 02:52:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 02:52:50 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 02:53:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5ba49a0b-6992-5fed-3d51-62bafc05f9db (at 10.9.104.56@o2ib4) Aug 10 02:53:17 fir-md1-s1 kernel: Lustre: Skipped 1275 previous similar messages Aug 10 02:53:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.109.21@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 02:53:31 fir-md1-s1 kernel: LustreError: Skipped 22090 previous similar messages Aug 10 02:53:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f4416ea3-3893-b0bb-aa3a-1c37eb4885e9 (at 10.9.105.28@o2ib4) reconnecting Aug 10 02:53:58 fir-md1-s1 kernel: Lustre: Skipped 1241 previous similar messages Aug 10 03:02:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:02:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:03:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 03:03:17 fir-md1-s1 kernel: Lustre: Skipped 1295 previous similar messages Aug 10 03:03:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.102.21@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:03:31 fir-md1-s1 kernel: LustreError: Skipped 21103 previous similar messages Aug 10 03:03:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0a76f504-1306-a831-1f93-856480da5211 (at 10.8.9.10@o2ib6) reconnecting Aug 10 03:03:58 fir-md1-s1 kernel: Lustre: Skipped 1230 previous similar messages Aug 10 03:06:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 03:06:16 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 10 03:11:00 fir-md1-s1 kernel: Lustre: 21332:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2fbccbad00 x1631549567423888/t0(0) o36->362621d0-7ac3-9c5b-280e-e0d76da4f0b2@10.9.106.66@o2ib4:5/0 lens 560/2888 e 1 to 0 dl 1565431865 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 03:12:16 fir-md1-s1 kernel: LustreError: 23741:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565431845, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f31aaafbcc0/0x5d9ee6c30c85e20d lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 60 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23741 timeout: 0 lvb_type: 0 Aug 10 03:12:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:12:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:13:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c2cdc4ec-0f2d-6e27-c5db-3cfc24b4505c (at 10.9.106.15@o2ib4) Aug 10 03:13:18 fir-md1-s1 kernel: Lustre: Skipped 1279 previous similar messages Aug 10 03:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.22@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:13:31 fir-md1-s1 kernel: LustreError: Skipped 22142 previous similar messages Aug 10 03:13:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 190e8c90-938d-b7f6-84df-7662b8e78e53 (at 10.9.107.71@o2ib4) reconnecting Aug 10 03:13:59 fir-md1-s1 kernel: Lustre: Skipped 1249 previous similar messages Aug 10 03:14:06 fir-md1-s1 kernel: LNet: Service thread pid 23741 was inactive for 200.68s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 03:14:06 fir-md1-s1 kernel: Pid: 23741, comm: mdt02_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 03:14:06 fir-md1-s1 kernel: Call Trace: Aug 10 03:14:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 03:14:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 03:14:06 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 03:14:06 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 03:14:06 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 03:14:06 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 03:14:06 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 03:14:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 03:14:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 03:14:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 03:14:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 03:14:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 03:14:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 03:14:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565432046.23741 Aug 10 03:16:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 03:16:17 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 10 03:22:56 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:22:56 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b352883b-c2c1-ae27-6f40-83f9a605942e (at 10.9.106.48@o2ib4) Aug 10 03:23:18 fir-md1-s1 kernel: Lustre: Skipped 1298 previous similar messages Aug 10 03:23:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.62@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:23:31 fir-md1-s1 kernel: LustreError: Skipped 21009 previous similar messages Aug 10 03:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 569c80f1-e322-40ae-cf23-d3ca8807a6fa (at 10.9.102.40@o2ib4) reconnecting Aug 10 03:23:59 fir-md1-s1 kernel: Lustre: Skipped 1268 previous similar messages Aug 10 03:26:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 03:26:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 03:32:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:32:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to bd290cb8-4233-63c7-3f80-b3bae5c2df00 (at 10.9.101.19@o2ib4) Aug 10 03:33:19 fir-md1-s1 kernel: Lustre: Skipped 1275 previous similar messages Aug 10 03:33:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.9.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:33:31 fir-md1-s1 kernel: LustreError: Skipped 22510 previous similar messages Aug 10 03:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4fd3697b-8ac3-d03c-d547-c2a2aae5b292 (at 10.8.28.8@o2ib6) reconnecting Aug 10 03:33:59 fir-md1-s1 kernel: Lustre: Skipped 1261 previous similar messages Aug 10 03:39:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 03:39:07 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 03:43:01 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:43:01 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:43:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b610ffdd-4311-25ec-bea8-99a62cc5a9d2 (at 10.8.28.2@o2ib6) Aug 10 03:43:19 fir-md1-s1 kernel: Lustre: Skipped 1288 previous similar messages Aug 10 03:43:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.107.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:43:31 fir-md1-s1 kernel: LustreError: Skipped 21420 previous similar messages Aug 10 03:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e7b57212-3a7e-4064-e6ff-77f892effff8 (at 10.9.109.22@o2ib4) reconnecting Aug 10 03:44:00 fir-md1-s1 kernel: Lustre: Skipped 1262 previous similar messages Aug 10 03:50:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 03:50:04 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 03:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0970ddf2-4674-37e8-9b93-0b8458d177fd (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f38a915dc00, cur 1565434213 expire 1565434063 last 1565433986 Aug 10 03:51:40 fir-md1-s1 kernel: Lustre: 23676:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565434293/real 1565434293] req@ffff8f2d23273900 x1636760962192240/t0(0) o106->fir-MDT0000@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565434300 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 03:51:40 fir-md1-s1 kernel: Lustre: 23676:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 03:53:02 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 03:53:02 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 03:53:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9990fede-164c-cd39-0e02-b7044df13ad7 (at 10.8.22.5@o2ib6) Aug 10 03:53:19 fir-md1-s1 kernel: Lustre: Skipped 1286 previous similar messages Aug 10 03:53:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.17.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 03:53:31 fir-md1-s1 kernel: LustreError: Skipped 22099 previous similar messages Aug 10 03:54:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 39e76845-4976-21c9-38bb-bb738759d72c (at 10.9.0.64@o2ib4) reconnecting Aug 10 03:54:00 fir-md1-s1 kernel: Lustre: Skipped 1266 previous similar messages Aug 10 04:02:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:02:10 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 04:03:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:03:05 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:03:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d974d009-1a97-5902-2190-9cfa01dc4ba2 (at 10.9.108.7@o2ib4) Aug 10 04:03:20 fir-md1-s1 kernel: Lustre: Skipped 1294 previous similar messages Aug 10 04:03:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.102.59@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:03:31 fir-md1-s1 kernel: LustreError: Skipped 21449 previous similar messages Aug 10 04:04:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 416fe0e2-ad8c-1554-7392-7df0b48f3b43 (at 10.9.106.14@o2ib4) reconnecting Aug 10 04:04:01 fir-md1-s1 kernel: Lustre: Skipped 1256 previous similar messages Aug 10 04:11:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f208de69c00, cur 1565435506 expire 1565435356 last 1565435279 Aug 10 04:11:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 04:12:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:12:26 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 10 04:13:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:13:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:13:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f60c199d-7611-7247-14ce-916a8ab83213 (at 10.9.112.13@o2ib4) Aug 10 04:13:20 fir-md1-s1 kernel: Lustre: Skipped 1289 previous similar messages Aug 10 04:13:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.113.8@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:13:31 fir-md1-s1 kernel: LustreError: Skipped 21837 previous similar messages Aug 10 04:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4072b1d6-edd3-180c-dea8-8ff7b460e07f (at 10.9.109.70@o2ib4) reconnecting Aug 10 04:14:02 fir-md1-s1 kernel: Lustre: Skipped 1263 previous similar messages Aug 10 04:16:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d23724800, cur 1565435812 expire 1565435662 last 1565435585 Aug 10 04:22:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:22:29 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 10 04:23:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:23:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 69dbe640-aa39-514f-7f18-531d66b56356 (at 10.9.105.28@o2ib4) Aug 10 04:23:21 fir-md1-s1 kernel: Lustre: Skipped 1320 previous similar messages Aug 10 04:23:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.106.8@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:23:32 fir-md1-s1 kernel: LustreError: Skipped 21311 previous similar messages Aug 10 04:24:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b781acea-774c-fec7-dd6f-6675c4ad7bbc (at 10.9.104.36@o2ib4) reconnecting Aug 10 04:24:02 fir-md1-s1 kernel: Lustre: Skipped 1265 previous similar messages Aug 10 04:32:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:32:50 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 04:33:12 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:33:12 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:33:21 fir-md1-s1 kernel: Lustre: 22287:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ec2040000 x1631634610406128/t0(0) o36->c4a74d2b-de98-9a37-7ebb-5f19657dadd1@10.9.108.2@o2ib4:26/0 lens 520/2888 e 1 to 0 dl 1565436806 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 04:33:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Aug 10 04:33:21 fir-md1-s1 kernel: Lustre: Skipped 1288 previous similar messages Aug 10 04:33:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.19.1@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:33:32 fir-md1-s1 kernel: LustreError: Skipped 22172 previous similar messages Aug 10 04:34:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 38516069-da41-9b1a-5b22-4b6fc1dfa003 (at 10.9.107.12@o2ib4) reconnecting Aug 10 04:34:03 fir-md1-s1 kernel: Lustre: Skipped 1264 previous similar messages Aug 10 04:34:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6bd2b3e9-dfbd-146a-9ca1-647fabdaf6f7 (at 10.8.26.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f89ea800, cur 1565436845 expire 1565436695 last 1565436618 Aug 10 04:34:36 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 10 04:34:36 fir-md1-s1 kernel: LustreError: 97662:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565436786, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f248a4898c0/0x5d9ee6c311775800 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 22 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6c311775807 expref: -99 pid: 97662 timeout: 0 lvb_type: 0 Aug 10 04:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e68c090a-7e46-8360-af53-54256297be7a (at 10.8.23.8@o2ib6) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f3510714000, cur 1565436921 expire 1565436771 last 1565436714 Aug 10 04:35:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 04:36:26 fir-md1-s1 kernel: LNet: Service thread pid 97662 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 04:36:26 fir-md1-s1 kernel: Pid: 97662, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 04:36:26 fir-md1-s1 kernel: Call Trace: Aug 10 04:36:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 04:36:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 10 04:36:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 10 04:36:26 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 10 04:36:26 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 04:36:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 04:36:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 04:36:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 04:36:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 04:36:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 04:36:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 04:36:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565436987.97662 Aug 10 04:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d57b739c-5a57-6abf-bc24-4e84c50305b3 (at 10.8.26.28@o2ib6) in 194 seconds. I think it's dead, and I am evicting it. exp ffff8f353b69c000, cur 1565436997 expire 1565436847 last 1565436803 Aug 10 04:36:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 04:37:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a5ea76a4-f5f8-f45b-a6b9-9a35f19ba3eb (at 10.8.26.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251fc97000, cur 1565437030 expire 1565436880 last 1565436803 Aug 10 04:43:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 10 04:43:22 fir-md1-s1 kernel: Lustre: Skipped 1332 previous similar messages Aug 10 04:43:27 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:43:27 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:43:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.36@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:43:32 fir-md1-s1 kernel: LustreError: Skipped 21379 previous similar messages Aug 10 04:43:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:43:54 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 04:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client da2044d0-4d1f-46be-9f3b-250354ced4dc (at 10.9.106.2@o2ib4) reconnecting Aug 10 04:44:03 fir-md1-s1 kernel: Lustre: Skipped 1304 previous similar messages Aug 10 04:53:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 85548aa4-f88f-33fe-b242-e96ff0dc50db (at 10.9.112.17@o2ib4) Aug 10 04:53:22 fir-md1-s1 kernel: Lustre: Skipped 1352 previous similar messages Aug 10 04:53:30 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 04:53:30 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 04:53:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.109.16@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 04:53:32 fir-md1-s1 kernel: LustreError: Skipped 22105 previous similar messages Aug 10 04:54:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1789d80e-ff23-2d17-851e-02c315f81c99 (at 10.9.108.36@o2ib4) reconnecting Aug 10 04:54:03 fir-md1-s1 kernel: Lustre: Skipped 1303 previous similar messages Aug 10 04:54:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 04:54:10 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 10 05:03:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0b766838-89ea-3d2e-06ca-f7727d84cf43 (at 10.8.28.8@o2ib6) Aug 10 05:03:23 fir-md1-s1 kernel: Lustre: Skipped 1305 previous similar messages Aug 10 05:03:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.104.31@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:03:32 fir-md1-s1 kernel: LustreError: Skipped 21629 previous similar messages Aug 10 05:03:33 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 05:03:33 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:04:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e7b57212-3a7e-4064-e6ff-77f892effff8 (at 10.9.109.22@o2ib4) reconnecting Aug 10 05:04:05 fir-md1-s1 kernel: Lustre: Skipped 1294 previous similar messages Aug 10 05:04:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 05:04:13 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 05:05:52 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2b23b57450 x1636458812809328/t0(0) o3->cfbfc9b7-8744-022c-cf1b-e1b223604a4f@10.9.108.48@o2ib4:27/0 lens 488/440 e 0 to 0 dl 1565438757 ref 1 fl Interpret:/0/0 rc 0/0 Aug 10 05:05:52 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 3 previous similar messages Aug 10 05:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with cfbfc9b7-8744-022c-cf1b-e1b223604a4f (at 10.9.108.48@o2ib4), client will retry: rc -107 Aug 10 05:05:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 05:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6fc94419-5699-05e1-de93-14bdcab0c270 (at 10.9.109.22@o2ib4) Aug 10 05:13:23 fir-md1-s1 kernel: Lustre: Skipped 1333 previous similar messages Aug 10 05:13:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.102.70@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:13:32 fir-md1-s1 kernel: LustreError: Skipped 21776 previous similar messages Aug 10 05:13:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 05:13:34 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ad682b46-15bf-0f3a-4bf3-bd0a52dcefe5 (at 10.9.105.8@o2ib4) reconnecting Aug 10 05:14:06 fir-md1-s1 kernel: Lustre: Skipped 1305 previous similar messages Aug 10 05:14:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 05:14:26 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 05:17:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d369512e-d185-62f6-08d4-359201134279 (at 10.8.23.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533733c00, cur 1565439461 expire 1565439311 last 1565439234 Aug 10 05:17:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 05:17:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 22c2b505-9472-beda-951d-3e1cac342b49 (at 10.8.23.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4506bfdc00, cur 1565439464 expire 1565439314 last 1565439237 Aug 10 05:17:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 05:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ce467344-71ab-c408-61ed-f15c4e8334ab (at 10.8.20.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2eedc0e800, cur 1565439641 expire 1565439491 last 1565439414 Aug 10 05:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 31c298c1-4d96-81ef-541a-e70384debecf (at 10.9.106.60@o2ib4) Aug 10 05:23:24 fir-md1-s1 kernel: Lustre: Skipped 1338 previous similar messages Aug 10 05:23:25 fir-md1-s1 kernel: Lustre: 23710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565439798/real 1565439798] req@ffff8f2682fcc800 x1636760967219184/t0(0) o106->fir-MDT0000@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565439805 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 05:23:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.8.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:23:32 fir-md1-s1 kernel: LustreError: Skipped 21477 previous similar messages Aug 10 05:23:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 05:23:37 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 416fe0e2-ad8c-1554-7392-7df0b48f3b43 (at 10.9.106.14@o2ib4) reconnecting Aug 10 05:24:07 fir-md1-s1 kernel: Lustre: Skipped 1302 previous similar messages Aug 10 05:26:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 05:26:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 10 05:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 788fc37f-342c-bd89-655e-98bd30378b63 (at 10.9.108.46@o2ib4) Aug 10 05:33:24 fir-md1-s1 kernel: Lustre: Skipped 1310 previous similar messages Aug 10 05:33:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.21.25@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:33:32 fir-md1-s1 kernel: LustreError: Skipped 21623 previous similar messages Aug 10 05:33:40 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 1 seconds Aug 10 05:33:40 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:34:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 39e76845-4976-21c9-38bb-bb738759d72c (at 10.9.0.64@o2ib4) reconnecting Aug 10 05:34:07 fir-md1-s1 kernel: Lustre: Skipped 1302 previous similar messages Aug 10 05:35:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27d4bbfc00, cur 1565440511 expire 1565440361 last 1565440284 Aug 10 05:35:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 05:36:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 05:36:33 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 05:43:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 35172f51-f276-7fee-5211-429a38b2533d (at 10.9.106.66@o2ib4) Aug 10 05:43:24 fir-md1-s1 kernel: Lustre: Skipped 1333 previous similar messages Aug 10 05:43:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.112.9@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:43:32 fir-md1-s1 kernel: LustreError: Skipped 21517 previous similar messages Aug 10 05:43:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 05:43:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b781acea-774c-fec7-dd6f-6675c4ad7bbc (at 10.9.104.36@o2ib4) reconnecting Aug 10 05:44:08 fir-md1-s1 kernel: Lustre: Skipped 1294 previous similar messages Aug 10 05:49:47 fir-md1-s1 kernel: LustreError: 21454:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f44375f7850 x1636458827203440/t0(0) o3->cfbfc9b7-8744-022c-cf1b-e1b223604a4f@10.9.108.48@o2ib4:23/0 lens 488/440 e 0 to 0 dl 1565441393 ref 1 fl Interpret:/0/0 rc 0/0 Aug 10 05:49:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with cfbfc9b7-8744-022c-cf1b-e1b223604a4f (at 10.9.108.48@o2ib4), client will retry: rc -107 Aug 10 05:49:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 05:49:59 fir-md1-s1 kernel: Lustre: Skipped 68 previous similar messages Aug 10 05:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 417eb8fb-f2a2-882b-b26c-14688b7b9170 (at 10.9.108.39@o2ib4) Aug 10 05:53:25 fir-md1-s1 kernel: Lustre: Skipped 1336 previous similar messages Aug 10 05:53:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.20.14@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 05:53:32 fir-md1-s1 kernel: LustreError: Skipped 21821 previous similar messages Aug 10 05:53:43 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 05:53:43 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 05:54:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 38516069-da41-9b1a-5b22-4b6fc1dfa003 (at 10.9.107.12@o2ib4) reconnecting Aug 10 05:54:09 fir-md1-s1 kernel: Lustre: Skipped 1290 previous similar messages Aug 10 05:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f74b84400, cur 1565441849 expire 1565441699 last 1565441622 Aug 10 06:01:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:01:24 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 10 06:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6ba987ac-0e77-3369-f332-35b2bed682b7 (at 10.9.106.50@o2ib4) Aug 10 06:03:25 fir-md1-s1 kernel: Lustre: Skipped 1315 previous similar messages Aug 10 06:03:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.1.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:03:32 fir-md1-s1 kernel: LustreError: Skipped 21755 previous similar messages Aug 10 06:03:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 06:03:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ca693efe-e963-3124-a59d-0beac55f4de3 (at 10.9.112.17@o2ib4) reconnecting Aug 10 06:04:10 fir-md1-s1 kernel: Lustre: Skipped 1301 previous similar messages Aug 10 06:11:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:11:26 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 06:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 69dbe640-aa39-514f-7f18-531d66b56356 (at 10.9.105.28@o2ib4) Aug 10 06:13:25 fir-md1-s1 kernel: Lustre: Skipped 1336 previous similar messages Aug 10 06:13:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.107.16@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:13:33 fir-md1-s1 kernel: LustreError: Skipped 21604 previous similar messages Aug 10 06:13:45 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 06:13:45 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:14:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4fd3697b-8ac3-d03c-d547-c2a2aae5b292 (at 10.8.28.8@o2ib6) reconnecting Aug 10 06:14:11 fir-md1-s1 kernel: Lustre: Skipped 1284 previous similar messages Aug 10 06:16:38 fir-md1-s1 kernel: Lustre: 21430:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565442991/real 1565442991] req@ffff8f21fd225100 x1636760970049760/t0(0) o106->fir-MDT0000@10.8.10.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565442998 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 06:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9af7a171-0ce3-a101-0b9d-d31b53b3f9c6 (at 10.8.20.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2cef636c00, cur 1565443086 expire 1565442936 last 1565442859 Aug 10 06:21:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:21:27 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 10 06:23:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0a855284-c89f-aa4a-1498-3c8d9206b44d (at 10.8.9.10@o2ib6) Aug 10 06:23:25 fir-md1-s1 kernel: Lustre: Skipped 1314 previous similar messages Aug 10 06:23:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.29@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:23:33 fir-md1-s1 kernel: LustreError: Skipped 21859 previous similar messages Aug 10 06:23:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 2 seconds Aug 10 06:23:46 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:24:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e7b57212-3a7e-4064-e6ff-77f892effff8 (at 10.9.109.22@o2ib4) reconnecting Aug 10 06:24:11 fir-md1-s1 kernel: Lustre: Skipped 1291 previous similar messages Aug 10 06:31:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:31:29 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 10 06:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8ba50a96-f3d9-3920-760c-8aedb752cbea (at 10.9.107.71@o2ib4) Aug 10 06:33:26 fir-md1-s1 kernel: Lustre: Skipped 1353 previous similar messages Aug 10 06:33:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.45@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:33:33 fir-md1-s1 kernel: LustreError: Skipped 21374 previous similar messages Aug 10 06:33:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 06:33:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:34:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ad682b46-15bf-0f3a-4bf3-bd0a52dcefe5 (at 10.9.105.8@o2ib4) reconnecting Aug 10 06:34:12 fir-md1-s1 kernel: Lustre: Skipped 1288 previous similar messages Aug 10 06:43:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 10 06:43:26 fir-md1-s1 kernel: Lustre: Skipped 1315 previous similar messages Aug 10 06:43:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.72@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:43:33 fir-md1-s1 kernel: LustreError: Skipped 21731 previous similar messages Aug 10 06:43:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:43:53 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 10 06:43:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 06:43:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:44:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 416fe0e2-ad8c-1554-7392-7df0b48f3b43 (at 10.9.106.14@o2ib4) reconnecting Aug 10 06:44:12 fir-md1-s1 kernel: Lustre: Skipped 1284 previous similar messages Aug 10 06:49:38 fir-md1-s1 kernel: Lustre: 10559:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4507554b00 x1635092296958672/t0(0) o36->d8fef078-1696-f96a-b12e-042f67fec1a7@10.9.109.35@o2ib4:12/0 lens 560/2888 e 1 to 0 dl 1565444982 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 06:50:52 fir-md1-s1 kernel: LustreError: 10309:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565444962, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3e93655a00/0x5d9ee6c315b7611c lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 62 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10309 timeout: 0 lvb_type: 0 Aug 10 06:52:43 fir-md1-s1 kernel: LNet: Service thread pid 10309 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 10 06:52:43 fir-md1-s1 kernel: Pid: 10309, comm: mdt03_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 10 06:52:43 fir-md1-s1 kernel: Call Trace: Aug 10 06:52:43 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 10 06:52:43 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 10 06:52:43 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 10 06:52:43 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 10 06:52:43 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 10 06:52:43 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 10 06:52:43 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 10 06:52:43 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 10 06:52:43 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 10 06:52:43 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 10 06:52:43 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 10 06:52:43 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 10 06:52:43 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 10 06:52:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565445163.10309 Aug 10 06:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0b766838-89ea-3d2e-06ca-f7727d84cf43 (at 10.8.28.8@o2ib6) Aug 10 06:53:27 fir-md1-s1 kernel: Lustre: Skipped 1350 previous similar messages Aug 10 06:53:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.7.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 06:53:33 fir-md1-s1 kernel: LustreError: Skipped 21511 previous similar messages Aug 10 06:53:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 06:53:54 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages Aug 10 06:54:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ee2fc29b-a70b-6d11-2477-6e5c3f3348b3 (at 10.8.20.18@o2ib6) reconnecting Aug 10 06:54:12 fir-md1-s1 kernel: Lustre: Skipped 1317 previous similar messages Aug 10 06:55:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 06:55:12 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 10 07:03:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6fc94419-5699-05e1-de93-14bdcab0c270 (at 10.9.109.22@o2ib4) Aug 10 07:03:27 fir-md1-s1 kernel: Lustre: Skipped 1361 previous similar messages Aug 10 07:03:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.32@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 07:03:33 fir-md1-s1 kernel: LustreError: Skipped 21871 previous similar messages Aug 10 07:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d1d37c59-1aef-cee4-6611-3ad516d77ba1 (at 10.9.108.39@o2ib4) reconnecting Aug 10 07:04:12 fir-md1-s1 kernel: Lustre: Skipped 1318 previous similar messages Aug 10 07:05:01 fir-md1-s1 kernel: Lustre: 20439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565445894/real 1565445894] req@ffff8f0a6da74200 x1636760972669808/t0(0) o104->MGS@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565445901 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:05:02 fir-md1-s1 kernel: Lustre: 20439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565445895/real 1565445895] req@ffff8f1917846f00 x1636760973030288/t0(0) o104->MGS@10.8.22.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565445902 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:05:02 fir-md1-s1 kernel: Lustre: 20439:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 10 07:05:05 fir-md1-s1 kernel: Lustre: 20439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565445894/real 1565445894] req@ffff8f37663b5d00 x1636760972648944/t0(0) o104->MGS@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565445905 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:05:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 07:05:16 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 10 07:06:33 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:06:33 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565445903, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f2c8bcfde80/0x5d9ee6c3167dfa70 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c3167dfa77 expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:06:33 fir-md1-s1 kernel: LustreError: 31245:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f348832af00) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:06:37 fir-md1-s1 kernel: LustreError: 20439:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565445907, 90s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff8f2e31579b00/0x5d9ee6c3167cdb1c lrc: 3/0,1 mode: --/EX res: [0x726966:0x2:0x0].0x0 rrc: 1397 type: PLN flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20439 timeout: 0 lvb_type: 0 Aug 10 07:08:03 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:08:03 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565445993, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f2c8bcfb180/0x5d9ee6c31682b21a lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31682b221 expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:08:03 fir-md1-s1 kernel: LustreError: 31274:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f348832a9c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:09:40 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:09:40 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446090, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f2c8bcfe0c0/0x5d9ee6c31682cf16 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31682cf1d expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:09:40 fir-md1-s1 kernel: LustreError: 31298:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f2429919380) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:11:11 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:11:11 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446180, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f2c8bcfbf00/0x5d9ee6c3168304e6 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c3168304ed expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:11:11 fir-md1-s1 kernel: LustreError: 31330:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f2429919740) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:12:49 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:12:49 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446279, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f44b4814380/0x5d9ee6c31683c287 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31683c28e expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:12:49 fir-md1-s1 kernel: LustreError: 31355:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f24299189c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:13:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a4efabe6-b524-c6c8-7da8-a22ef57bfe19 (at 10.9.105.8@o2ib4) Aug 10 07:13:28 fir-md1-s1 kernel: Lustre: Skipped 2756 previous similar messages Aug 10 07:13:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.11.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 07:13:41 fir-md1-s1 kernel: LustreError: Skipped 4853 previous similar messages Aug 10 07:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6bb1b23c-28f8-153d-8cc1-2ff0115f9167 (at 10.9.106.58@o2ib4) reconnecting Aug 10 07:14:13 fir-md1-s1 kernel: Lustre: Skipped 1323 previous similar messages Aug 10 07:14:19 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:14:19 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446369, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f44b4812400/0x5d9ee6c31683e0b7 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31683e0be expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:14:19 fir-md1-s1 kernel: LustreError: 31386:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f1e9df33800) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:15:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 10 07:15:50 fir-md1-s1 kernel: Lustre: Skipped 2795 previous similar messages Aug 10 07:15:58 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:15:58 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446468, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f44b4817080/0x5d9ee6c316840301 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c316840308 expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:15:58 fir-md1-s1 kernel: LustreError: 31428:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f1e9df335c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:17:20 fir-md1-s1 kernel: Lustre: 20201:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565445933/real 1565445933] req@ffff8f2d23273000 x1636760973096800/t0(0) o400->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/224 e 27 to 1 dl 1565446640 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Aug 10 07:17:20 fir-md1-s1 kernel: Lustre: 20201:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 10 07:17:28 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:17:28 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446558, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f4020450900/0x5d9ee6c31684b5ce lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31684b5d5 expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:17:28 fir-md1-s1 kernel: LustreError: 31460:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f0e6aa635c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:17:54 fir-md1-s1 kernel: Lustre: 20201:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565446018/real 1565446018] req@ffff8f31bfed6300 x1636760973130432/t0(0) o400->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/224 e 25 to 1 dl 1565446674 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Aug 10 07:17:54 fir-md1-s1 kernel: Lustre: 20201:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 07:19:07 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:19:07 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446657, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f3e8164ca40/0x5d9ee6c31684f816 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31684f81d expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:19:07 fir-md1-s1 kernel: LustreError: 31486:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f0e6aa62240) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:20:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 10 07:20:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Aug 10 07:20:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (30): c: 5, oc: 0, rc: 8 Aug 10 07:20:28 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565446843, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f3ab85d4380/0x5d9ee6c316860446 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31686044d expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: 31550:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f0e6aa63380) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:22:13 fir-md1-s1 kernel: LustreError: 31550:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Aug 10 07:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.52@o2ib7: 0 seconds Aug 10 07:22:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 43 previous similar messages Aug 10 07:23:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6855e9e0-65ba-17ef-48f8-cf674cb5aba9 (at 10.9.106.14@o2ib4) Aug 10 07:23:28 fir-md1-s1 kernel: Lustre: Skipped 4125 previous similar messages Aug 10 07:23:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.9@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 07:23:41 fir-md1-s1 kernel: LustreError: Skipped 17375 previous similar messages Aug 10 07:23:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b3678950-2d02-7c65-1df2-b3f531925bdc (at 10.0.10.52@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15ae814000, cur 1565447025 expire 1565446875 last 1565446798 Aug 10 07:23:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 07:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 362621d0-7ac3-9c5b-280e-e0d76da4f0b2 (at 10.9.106.66@o2ib4) reconnecting Aug 10 07:24:13 fir-md1-s1 kernel: Lustre: Skipped 1331 previous similar messages Aug 10 07:26:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 07:26:37 fir-md1-s1 kernel: Lustre: Skipped 2805 previous similar messages Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565447121, 90s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8f36cfbd9d40/0x5d9ee6c31686a4d6 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x5d9ee6c31686a4dd expref: -99 pid: 20386 timeout: 0 lvb_type: 0 Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: 20386:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: 31616:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8f348832a900) refcount nonzero (1) after lock cleanup; forcing cleanup. Aug 10 07:26:51 fir-md1-s1 kernel: LustreError: 31616:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 2 previous similar messages Aug 10 07:33:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5e90b32f-f588-dfef-191f-169796896533 (at 10.8.11.36@o2ib6) Aug 10 07:33:28 fir-md1-s1 kernel: Lustre: Skipped 4142 previous similar messages Aug 10 07:33:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.108.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 07:33:42 fir-md1-s1 kernel: LustreError: Skipped 18434 previous similar messages Aug 10 07:34:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0a76f504-1306-a831-1f93-856480da5211 (at 10.8.9.10@o2ib6) reconnecting Aug 10 07:34:13 fir-md1-s1 kernel: Lustre: Skipped 1323 previous similar messages Aug 10 07:35:01 fir-md1-s1 kernel: LNet: Service thread pid 23562 completed after 24035.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 10 07:35:01 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (21020:1s); client may timeout. req@ffff8f275b24cb00 x1631385943492384/t357626360529(0) o36->3926cda2-471a-9775-ffa6-15d857ceb079@10.8.13.2@o2ib6:10/0 lens 528/424 e 0 to 0 dl 1565447700 ref 1 fl Complete:/0/0 rc 0/0 Aug 10 07:35:01 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 10 07:36:15 fir-md1-s1 kernel: Lustre: 20457:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447768/real 1565447768] req@ffff8f0998da0600 x1636760974286352/t0(0) o104->fir-MDT0000@10.9.106.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565447775 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:36:15 fir-md1-s1 kernel: Lustre: 20457:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 07:36:22 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447775/real 1565447775] req@ffff8f2d3bb86600 x1636760974363952/t0(0) o105->MGS@10.8.11.6@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1565447782 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:36:29 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447782/real 1565447782] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447789 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:36:29 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 10 07:36:30 fir-md1-s1 kernel: Lustre: 48114:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1dd1645100 x1638946275969904/t0(0) o103->4b6690b3-7063-372c-0cb3-446cb87a70b6@10.8.30.15@o2ib6:5/0 lens 328/224 e 1 to 0 dl 1565447795 ref 2 fl Interpret:H/0/0 rc 0/0 Aug 10 07:36:33 fir-md1-s1 kernel: Lustre: 26256:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1601211200 x1631618778684528/t0(0) o101->2760e021-c1fe-d2a9-3b01-eeefd52010e6@10.8.7.5@o2ib6:8/0 lens 584/3264 e 0 to 0 dl 1565447798 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:36:34 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f164968cb00 x1635620984629024/t0(0) o101->d072205a-1b1b-636c-7696-e9d92af1edee@10.8.20.3@o2ib6:9/0 lens 584/3264 e 0 to 0 dl 1565447799 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:36:34 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Aug 10 07:36:36 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447789/real 1565447789] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447796 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:36:36 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Aug 10 07:36:37 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f31b1c20900 x1631554000746480/t0(0) o101->8677433a-08df-e12f-9cbe-ab844f71c9a4@10.9.106.69@o2ib4:12/0 lens 584/3264 e 0 to 0 dl 1565447802 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:36:37 fir-md1-s1 kernel: Lustre: 20996:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 10 07:36:41 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2215d36300 x1631440653637296/t0(0) o101->3373de9a-85b5-9e8c-3bf7-fc7b61c3cd4b@10.8.20.2@o2ib6:16/0 lens 584/3264 e 0 to 0 dl 1565447806 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:36:41 fir-md1-s1 kernel: Lustre: 50448:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 39 previous similar messages Aug 10 07:36:43 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447796/real 1565447796] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447803 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:36:43 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Aug 10 07:36:43 fir-md1-s1 kernel: LustreError: 20457:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.13@o2ib4) failed to reply to blocking AST (req@ffff8f0998da0600 x1636760974286352 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f0e8ac27bc0/0x5d9ee6c3168dc392 lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 79 type: IBT flags: 0x60200400000020 nid: 10.9.106.13@o2ib4 remote: 0x2f2cb83926780d0e expref: 12 pid: 10308 timeout: 4562885 lvb_type: 0 Aug 10 07:36:43 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.106.13@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 10 07:36:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.106.13@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f0e8ac27bc0/0x5d9ee6c3168dc392 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 79 type: IBT flags: 0x60200400000020 nid: 10.9.106.13@o2ib4 remote: 0x2f2cb83926780d0e expref: 13 pid: 10308 timeout: 0 lvb_type: 0 Aug 10 07:36:43 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f220c053300 x1636424595648720/t0(0) o101->095971d4-2c15-c9c6-8336-964f67ec504b@10.9.105.69@o2ib4:12/0 lens 584/536 e 0 to 0 dl 1565447802 ref 1 fl Complete:/0/0 rc 0/0 Aug 10 07:36:43 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 10 07:36:57 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447810/real 1565447810] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447817 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:36:57 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Aug 10 07:37:18 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447831/real 1565447831] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447838 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:37:18 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages Aug 10 07:37:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 07:37:28 fir-md1-s1 kernel: Lustre: Skipped 2793 previous similar messages Aug 10 07:37:53 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447866/real 1565447866] req@ffff8f2d3bb81800 x1636760974364320/t0(0) o105->MGS@10.9.106.13@o2ib4:15/16 lens 304/224 e 0 to 1 dl 1565447873 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:37:53 fir-md1-s1 kernel: Lustre: 21366:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages Aug 10 07:39:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 257e47e9-78a0-a5d9-0d4b-1c08db5bc591 (at 10.9.106.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520991800, cur 1565447942 expire 1565447792 last 1565447715 Aug 10 07:39:02 fir-md1-s1 kernel: Lustre: 21366:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:147s); client may timeout. req@ffff8f1dd1645100 x1638946275969904/t0(0) o103->4b6690b3-7063-372c-0cb3-446cb87a70b6@10.8.30.15@o2ib6:5/0 lens 328/192 e 1 to 0 dl 1565447795 ref 1 fl Complete:H/0/0 rc 0/0 Aug 10 07:39:58 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565447991/real 1565447991] req@ffff8f1350adcb00 x1636760978532592/t0(0) o104->fir-MDT0000@10.9.108.36@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565447998 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:39:58 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 59 previous similar messages Aug 10 07:40:16 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f44b37fda00 x1641227053909264/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:21/0 lens 576/3264 e 0 to 0 dl 1565448021 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:40:16 fir-md1-s1 kernel: Lustre: 10309:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Aug 10 07:40:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0acedf63-8faf-16cc-e952-5f052a933884 (at 10.9.101.57@o2ib4) in 175 seconds. I think it's dead, and I am evicting it. exp ffff8f1489482800, cur 1565448018 expire 1565447868 last 1565447843 Aug 10 07:40:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 07:40:59 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f224d761200 x1631773060621040/t0(0) o101->a7e1fb1c-820c-11c6-929c-c307b02e7548@10.8.22.5@o2ib6:4/0 lens 576/3264 e 0 to 0 dl 1565448064 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:40:59 fir-md1-s1 kernel: Lustre: 26254:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 07:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0acedf63-8faf-16cc-e952-5f052a933884 (at 10.9.101.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f35e97efc00, cur 1565448070 expire 1565447920 last 1565447843 Aug 10 07:41:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 07:41:21 fir-md1-s1 kernel: LustreError: 23582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565447991, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3b2a8c8b40/0x5d9ee6c3180501fe lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 87 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23582 timeout: 0 lvb_type: 0 Aug 10 07:41:24 fir-md1-s1 kernel: LustreError: 21677:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565447994, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f27912f86c0/0x5d9ee6c31808ce87 lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 87 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21677 timeout: 0 lvb_type: 0 Aug 10 07:41:38 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1712144b00 x1631554006481504/t0(0) o101->8677433a-08df-e12f-9cbe-ab844f71c9a4@10.9.106.69@o2ib4:13/0 lens 584/3264 e 0 to 0 dl 1565448103 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 07:41:38 fir-md1-s1 kernel: Lustre: 26253:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Aug 10 07:42:04 fir-md1-s1 kernel: LustreError: 21460:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565448034, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f24975ca400/0x5d9ee6c318423b75 lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 121 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21460 timeout: 0 lvb_type: 0 Aug 10 07:42:06 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565448036, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f194eacfbc0/0x5d9ee6c318452e56 lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 121 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23743 timeout: 0 lvb_type: 0 Aug 10 07:42:06 fir-md1-s1 kernel: LustreError: 23743:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 10 07:42:11 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565448124/real 1565448124] req@ffff8f1350adcb00 x1636760978532592/t0(0) o104->fir-MDT0000@10.9.108.36@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565448131 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 07:42:11 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 56 previous similar messages Aug 10 07:42:13 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565448043, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3010525340/0x5d9ee6c3184eac59 lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23746 timeout: 0 lvb_type: 0 Aug 10 07:42:13 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages Aug 10 07:42:23 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565448053, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1869d7af40/0x5d9ee6c31864e16a lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 138 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97648 timeout: 0 lvb_type: 0 Aug 10 07:42:23 fir-md1-s1 kernel: LustreError: 97648:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 10 07:42:25 fir-md1-s1 kernel: LustreError: 23687:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.108.36@o2ib4) failed to reply to blocking AST (req@ffff8f1350adcb00 x1636760978532592 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f342ee06c00/0x5d9ee6c316f6669a lrc: 4/0,0 mode: PR/PR res: [0x200029f5a:0x49:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.108.36@o2ib4 remote: 0xcc43499722020b14 expref: 381 pid: 24582 timeout: 4563347 lvb_type: 0 Aug 10 07:42:25 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.108.36@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 10 07:42:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.108.36@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f342ee06c00/0x5d9ee6c316f6669a lrc: 3/0,0 mode: PR/PR res: [0x200029f5a:0x49:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.108.36@o2ib4 remote: 0xcc43499722020b14 expref: 382 pid: 24582 timeout: 0 lvb_type: 0 Aug 10 07:43:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1789d80e-ff23-2d17-851e-02c315f81c99 (at 10.9.108.36@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f8cb2bc00, cur 1565448187 expire 1565448037 last 1565447960 Aug 10 07:43:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 07:43:53 fir-md1-s1 kernel: Lustre: Skipped 1765 previous similar messages Aug 10 07:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 07:44:25 fir-md1-s1 kernel: Lustre: Skipped 247 previous similar messages Aug 10 07:47:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 07:47:52 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 10 07:50:33 fir-md1-s1 kernel: Lustre: 21677:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565448626/real 1565448626] req@ffff8f2f3aa43f00 x1636760987329904/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565448633 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:50:33 fir-md1-s1 kernel: Lustre: 21677:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 10 07:54:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 07:54:18 fir-md1-s1 kernel: Lustre: Skipped 17906 previous similar messages Aug 10 07:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 07:54:47 fir-md1-s1 kernel: Lustre: Skipped 17872 previous similar messages Aug 10 07:57:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 10 07:57:58 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 07:59:58 fir-md1-s1 kernel: Lustre: 21369:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565449191/real 1565449191] req@ffff8f0fa4979b00 x1636760992393984/t0(0) o104->fir-MDT0000@10.8.11.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565449198 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 07:59:58 fir-md1-s1 kernel: Lustre: 21369:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 08:01:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 08:01:31 fir-md1-s1 kernel: LustreError: Skipped 5003 previous similar messages Aug 10 08:04:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 10 08:04:20 fir-md1-s1 kernel: Lustre: Skipped 75 previous similar messages Aug 10 08:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 08:04:48 fir-md1-s1 kernel: Lustre: Skipped 39 previous similar messages Aug 10 08:08:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9418c733-9799-d44c-3b7e-9c4f6384b615 (at 10.8.25.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f392d61fc00, cur 1565449697 expire 1565449547 last 1565449470 Aug 10 08:08:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 08:08:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.6@o2ib6, removing former export from same NID Aug 10 08:08:30 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Aug 10 08:09:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c6c03da3-2796-a9a6-57aa-39e9dfeee895 (at 10.8.26.8@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f2521d86000, cur 1565449773 expire 1565449623 last 1565449549 Aug 10 08:09:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 08:14:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d2b05ed-81cd-3d62-bb9f-e3f301bfd456 (at 10.8.11.6@o2ib6) Aug 10 08:14:23 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 10 08:14:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 08:14:52 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 10 08:18:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17e77a2800, cur 1565450338 expire 1565450188 last 1565450111 Aug 10 08:18:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 08:19:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17e26c1e-4877-4fff-89e1-78bf5463918b (at 10.8.11.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17127c8000, cur 1565450344 expire 1565450194 last 1565450117 Aug 10 08:19:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 08:19:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.23@o2ib6, removing former export from same NID Aug 10 08:19:23 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 10 08:24:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 08:24:42 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 10 08:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 08:25:04 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 10 08:30:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 08:30:51 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 10 08:33:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 08:33:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 08:35:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 08:35:05 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 10 08:35:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 08:35:05 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 10 08:35:24 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f0c716000 x1638092007824928/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:29/0 lens 1792/3288 e 0 to 0 dl 1565451329 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 08:35:24 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Aug 10 08:35:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2c1ef60480/0x5d9ee6c32d935bf7 lrc: 3/0,0 mode: PR/PR res: [0x2c002c360:0x50e0:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85e46c07 expref: 922 pid: 10146 timeout: 4566388 lvb_type: 0 Aug 10 08:44:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.20@o2ib6, removing former export from same NID Aug 10 08:44:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 08:45:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 08:45:13 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 10 08:45:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 08:45:13 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Aug 10 08:48:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 08:55:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 08:55:19 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 10 08:55:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 08:55:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 10 08:56:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 08:56:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 08:57:16 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3170fddd00 x1641227062291168/t0(0) o36->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:21/0 lens 496/2888 e 1 to 0 dl 1565452641 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 08:57:16 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 08:57:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1be3b09200/0x5d9ee6c3390afaed lrc: 3/0,0 mode: PR/PR res: [0x200029fa0:0xae6:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f85e4d77d expref: 538 pid: 50582 timeout: 4567710 lvb_type: 0 Aug 10 09:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 09:05:24 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 09:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 09:05:24 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 10 09:07:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 09:07:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 09:15:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 09:15:25 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 10 09:15:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 09:15:25 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 09:17:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 09:17:50 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 09:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 09:26:11 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 10 09:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 09:26:11 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Aug 10 09:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f03aa5e8-f764-2262-c217-2e99830bfe5f (at 10.8.22.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f898d000, cur 1565454490 expire 1565454340 last 1565454263 Aug 10 09:30:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 09:30:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 09:36:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 09:36:21 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 10 09:36:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 09:36:21 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 10 09:38:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 09:40:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 09:40:58 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 09:46:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 09:46:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 09:46:27 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 09:46:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 09:46:27 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 09:51:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 09:51:48 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 09:54:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0f075213-a4be-c82e-e718-857e6b33a4f8 (at 10.8.12.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251fc8ec00, cur 1565456072 expire 1565455922 last 1565455845 Aug 10 09:54:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 09:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 09:56:27 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 10 09:56:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 09:56:27 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 10 10:02:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 10:02:39 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 10:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 10:06:33 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 10:06:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 10:06:33 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 10:16:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 10:16:41 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 10:16:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 10:16:41 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 10 10:17:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 10:17:03 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 10:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0f1febf4-4926-d855-a2ad-f268c7b55b71 (at 10.8.23.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4502c54c00, cur 1565457622 expire 1565457472 last 1565457395 Aug 10 10:20:22 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 10 10:27:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 10:27:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 10:27:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 10:27:06 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 10 10:27:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 10:27:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 10:37:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to dabc3f4f-46f2-c224-dc91-4feb601c74f4 (at 10.8.2.26@o2ib6) Aug 10 10:37:28 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Aug 10 10:37:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 10:37:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 10:37:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 10 10:37:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 10:38:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 10:38:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 10:40:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 10:46:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 10:47:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 10:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 10:47:35 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 10 10:48:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 10:48:15 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 10:48:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 10:48:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 10 10:52:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6b5a58e8-f6cd-7144-fe7f-c8e072c14f3d (at 10.8.22.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3727d4e800, cur 1565459577 expire 1565459427 last 1565459350 Aug 10 10:52:57 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 10 10:57:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 10:57:38 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 10 10:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 10:58:51 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 10:59:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 10:59:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 11:05:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 11:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 11:07:53 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 10 11:09:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 11:09:18 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 11:09:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 11:09:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 11:18:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 11:18:03 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 10 11:19:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 11:19:23 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 11:23:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 11:23:03 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 11:25:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 11:28:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 11:28:31 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 11:29:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 11:29:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 11:32:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 11:36:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 11:36:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 11:38:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b270ac6a-a3a3-60fc-aec3-c49e072fb0ae (at 10.8.30.23@o2ib6) Aug 10 11:38:33 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 11:39:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 11:39:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 11:47:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 11:48:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 11:48:33 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 10 11:49:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 11:49:39 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 11:50:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 11:50:08 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 10 11:54:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 11:58:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 11:58:41 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 10 11:59:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 11:59:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 11:59:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 12:00:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) reconnecting Aug 10 12:00:25 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 10 12:08:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) Aug 10 12:08:41 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 10 12:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) reconnecting Aug 10 12:10:27 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 10 12:10:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 12:10:39 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 12:19:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 12:19:05 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Aug 10 12:21:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 12:21:07 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 12:21:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 12:21:38 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 10 12:24:43 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565465076/real 1565465076] req@ffff8f1d502e3c00 x1636761122325088/t0(0) o104->fir-MDT0002@10.8.22.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565465083 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 12:24:43 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 10 12:24:51 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1bb13f2100 x1631571599631600/t0(0) o101->02dfd968-e7b1-52cc-0db8-aa0d10c0832c@10.9.102.19@o2ib4:26/0 lens 1784/3288 e 1 to 0 dl 1565465096 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 12:24:51 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 12:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2dc60f0c-9f07-0bd5-cd64-f7199eae9ce1 (at 10.8.21.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4524276800, cur 1565465096 expire 1565464946 last 1565464869 Aug 10 12:24:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 12:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) in 208 seconds. I think it's dead, and I am evicting it. exp ffff8f097d737c00, cur 1565465172 expire 1565465022 last 1565464964 Aug 10 12:26:12 fir-md1-s1 kernel: Lustre: Skipped 161 previous similar messages Aug 10 12:26:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 02d0382c-ac98-fa2d-e4db-d0092db77da5 (at 10.8.22.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520badc00, cur 1565465191 expire 1565465041 last 1565464964 Aug 10 12:26:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 12:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6dc651d0-2b7a-dd35-f234-bffd4712bc50 (at 10.8.30.23@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff8f146839b400, cur 1565465248 expire 1565465098 last 1565465025 Aug 10 12:27:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 12:29:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 12:29:13 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 12:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 12:32:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 12:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b1066692-bc3f-57f9-40da-237404ce622e (at 10.9.101.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2981572000, cur 1565465556 expire 1565465406 last 1565465329 Aug 10 12:32:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 12:33:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 12:33:24 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 12:33:44 fir-md1-s1 kernel: Lustre: 10195:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1038b02700 x1636442828183120/t0(0) o101->829e8e6e-3608-cb1f-779c-fe5437a6c742@10.9.102.33@o2ib4:19/0 lens 576/3264 e 1 to 0 dl 1565465629 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 12:33:44 fir-md1-s1 kernel: Lustre: 10195:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 12:33:46 fir-md1-s1 kernel: Lustre: 50576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3d8df5b900 x1641227080526160/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:21/0 lens 576/3264 e 0 to 0 dl 1565465631 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 12:34:04 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f305534b600 x1634926160602752/t0(0) o101->8c55cb94-7e98-7ab0-0640-ed020030cf15@10.8.30.21@o2ib6:9/0 lens 584/3264 e 1 to 0 dl 1565465649 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 12:34:04 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 12:37:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 12:39:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 12:39:59 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 12:40:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bde76402-4bdb-3c73-0ae0-dcf361142e6d (at 10.9.114.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f358d500400, cur 1565466055 expire 1565465905 last 1565465828 Aug 10 12:40:55 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 10 12:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 12:42:08 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 10 12:44:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 12:44:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 12:44:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 12:50:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 12:50:11 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 12:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 12:53:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 12:57:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 12:57:07 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 10 13:00:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9db013e3-baa2-64bf-ffc3-97fc6cf78d20 (at 10.8.30.13@o2ib6) Aug 10 13:00:11 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Aug 10 13:04:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 13:04:56 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 13:10:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 13:10:03 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 13:10:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7ebad617-5316-c053-b847-71ed2bf33173 (at 10.8.30.29@o2ib6) Aug 10 13:10:22 fir-md1-s1 kernel: Lustre: Skipped 179 previous similar messages Aug 10 13:15:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 13:15:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 13:20:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 13:20:22 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 10 13:25:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 13:25:41 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 13:27:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 13:27:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 13:30:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 13:30:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 13:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 13:38:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 13:38:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 13:40:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 067819b6-4047-96fd-9319-8b00d70fc797 (at 10.9.108.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22ab752400, cur 1565469638 expire 1565469488 last 1565469411 Aug 10 13:40:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 13:42:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 13:42:18 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 13:45:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 13:45:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 13:48:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 13:48:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 13:52:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 13:52:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 13:57:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 13:57:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 13:59:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 13:59:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 14:02:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 14:02:38 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 14:10:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 14:10:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 14:11:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 14:11:04 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 14:12:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 14:12:43 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 14:22:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 14:22:39 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 10 14:23:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 14:23:34 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 14:23:38 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565472211/real 1565472211] req@ffff8f1f1b844b00 x1636761183655040/t0(0) o104->fir-MDT0002@10.8.20.17@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565472218 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 14:23:38 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 14:23:46 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1aec27a700 x1638784639942464/t0(0) o101->3ac1581a-a94e-22b3-2bf3-b18d4bc33b46@10.9.104.26@o2ib4:21/0 lens 1792/3288 e 1 to 0 dl 1565472231 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 14:23:52 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565472225/real 1565472225] req@ffff8f1f1b844b00 x1636761183655040/t0(0) o104->fir-MDT0002@10.8.20.17@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565472232 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 14:23:52 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 14:23:56 fir-md1-s1 kernel: Lustre: 21369:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0689b59b00 x1641227090008944/t0(0) o101->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:1/0 lens 576/3264 e 0 to 0 dl 1565472241 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 14:24:13 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565472246/real 1565472246] req@ffff8f1f1b844b00 x1636761183655040/t0(0) o104->fir-MDT0002@10.8.20.17@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565472253 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 14:24:13 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 10 14:24:55 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565472288/real 1565472288] req@ffff8f1f1b844b00 x1636761183655040/t0(0) o104->fir-MDT0002@10.8.20.17@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565472295 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 14:24:55 fir-md1-s1 kernel: Lustre: 22283:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 10 14:25:01 fir-md1-s1 kernel: LustreError: 21127:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565472211, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f3046c269c0/0x5d9ee6c3a5464019 lrc: 3/1,0 mode: --/PR res: [0x2c002c360:0x50e0:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21127 timeout: 0 lvb_type: 0 Aug 10 14:25:51 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.17@o2ib6) returned error from blocking AST (req@ffff8f1f1b844b00 x1636761183655040 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2fcc2e3180/0x5d9ee6c3a508baeb lrc: 4/0,0 mode: PR/PR res: [0x2c002c360:0x50e0:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.20.17@o2ib6 remote: 0x86b10010b1a9dfdd expref: 1370 pid: 23685 timeout: 4587560 lvb_type: 0 Aug 10 14:25:51 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages Aug 10 14:25:51 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.17@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Aug 10 14:25:51 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 10 14:25:51 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 140s: evicting client at 10.8.20.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2fcc2e3180/0x5d9ee6c3a508baeb lrc: 3/0,0 mode: PR/PR res: [0x2c002c360:0x50e0:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.20.17@o2ib6 remote: 0x86b10010b1a9dfdd expref: 1371 pid: 23685 timeout: 0 lvb_type: 0 Aug 10 14:26:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3602abbe-50a9-7ca3-dca1-96568d2aef0d (at 10.8.20.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30d9246800, cur 1565472378 expire 1565472228 last 1565472151 Aug 10 14:26:18 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 14:26:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 14:26:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 14:36:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 14:36:00 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 10 14:36:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 14:36:55 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 14:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 14:37:54 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 14:38:42 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2030bcbc00 x1631783955937744/t0(0) o101->1a625ce5-039b-2212-3e18-697bf14e7a6e@10.8.24.30@o2ib6:17/0 lens 480/568 e 1 to 0 dl 1565473127 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 14:38:42 fir-md1-s1 kernel: Lustre: 22282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 10 14:38:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 14:38:54 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 10 14:39:16 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 10 14:39:18 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 10 14:39:18 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 10 14:46:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 14:46:06 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 10 14:49:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 14:49:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 10 14:50:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 14:50:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 14:54:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 14:56:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 14:56:45 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 15:00:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 15:00:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 15:00:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 15:00:48 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 15:07:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 15:07:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 15:12:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 15:12:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 15:13:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 15:13:38 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 15:17:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 15:17:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 15:22:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 15:22:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 15:23:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 15:23:49 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 15:27:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 15:27:49 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 15:33:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 15:33:11 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 15:34:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 15:34:41 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 15:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 15:38:51 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 15:43:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 15:43:25 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 15:45:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 15:45:10 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 15:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 15:49:23 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 10 15:55:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 15:55:52 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 15:57:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 15:57:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 16:00:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 16:00:00 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 16:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 16:07:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 16:07:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 16:07:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 16:10:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 16:10:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 16:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 16:17:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 16:17:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 16:17:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 16:20:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 16:20:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 16:27:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 16:27:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 16:31:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 16:31:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 16:31:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 16:31:39 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 16:38:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 16:39:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 16:39:51 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 16:40:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 16:41:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 16:41:41 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 16:42:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 16:45:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 16:45:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 10 16:49:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 16:49:51 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 16:52:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 16:52:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 16:57:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 16:57:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 16:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e3e6d1fd-6534-3c18-933a-2a723830beb5 (at 10.8.2.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e99829400, cur 1565481547 expire 1565481397 last 1565481320 Aug 10 16:59:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 17:00:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:00:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 17:03:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 17:03:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 17:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:10:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 17:10:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 17:10:51 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 17:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 17:13:29 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 17:21:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 17:21:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 17:23:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 17:23:53 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 17:24:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:24:35 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 17:34:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 17:34:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 17:34:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:34:35 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 17:35:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 17:35:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 17:38:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 17:44:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 17:44:08 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 17:44:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:44:39 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 17:50:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 17:50:09 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 17:54:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 17:54:16 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 10 17:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 17:54:43 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 18:00:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 18:00:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 18:04:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 18:04:24 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 18:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 18:06:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 18:11:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 18:11:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 18:14:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 18:14:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 18:16:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 18:16:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 18:21:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 18:21:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 10 18:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 18:25:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 18:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 18:26:22 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 18:33:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 18:33:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 18:35:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 18:35:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 18:37:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 18:37:48 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 18:44:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 18:44:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 18:46:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 18:46:05 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 18:48:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 18:48:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 18:54:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 18:54:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 18:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 18:56:13 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 18:58:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 18:58:24 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 19:05:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 19:05:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 19:07:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 19:07:06 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 19:09:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 19:09:03 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 19:15:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 19:15:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 19:16:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3c3c13e2-7abe-84ff-6e99-341fbf61e7bc (at 10.9.108.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20c9fae800, cur 1565489805 expire 1565489655 last 1565489578 Aug 10 19:16:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 19:18:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 19:18:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 19:19:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 19:19:14 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 19:19:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 19:22:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 19:26:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 19:26:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 19:28:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 19:28:58 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 19:29:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 19:29:25 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 19:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 19:39:45 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 19:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 19:39:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 19:40:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 19:40:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 19:48:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 19:49:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 19:49:45 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 10 19:49:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 19:49:45 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 19:50:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 19:50:39 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 19:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 19:59:53 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 19:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 19:59:53 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 20:01:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 20:01:26 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 20:10:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 20:10:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 20:10:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 20:10:32 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 10 20:12:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 20:12:13 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 20:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 20:20:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 20:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 20:20:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 20:21:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 20:22:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 20:22:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 20:26:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 20:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 20:30:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 10 20:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 20:30:59 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 10 20:34:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 20:34:06 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 20:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 20:41:03 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 20:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 20:41:03 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 20:46:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 20:46:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 20:48:21 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565495294/real 1565495294] req@ffff8f21b2996600 x1636761361900400/t0(0) o106->fir-MDT0000@10.8.30.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565495301 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 20:48:21 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 10 20:48:29 fir-md1-s1 kernel: Lustre: 20460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f244d379200 x1635094446959888/t0(0) o101->dac1b890-e51e-7605-0de6-9aadda9d8c58@10.9.109.24@o2ib4:4/0 lens 480/568 e 1 to 0 dl 1565495314 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 20:48:35 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565495308/real 1565495308] req@ffff8f21b2996600 x1636761361900400/t0(0) o106->fir-MDT0000@10.8.30.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565495315 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 20:48:35 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 20:48:56 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565495329/real 1565495329] req@ffff8f21b2996600 x1636761361900400/t0(0) o106->fir-MDT0000@10.8.30.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565495336 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 20:48:56 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 10 20:49:38 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565495371/real 1565495371] req@ffff8f21b2996600 x1636761361900400/t0(0) o106->fir-MDT0000@10.8.30.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565495378 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 20:49:38 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 10 20:49:59 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.30.2@o2ib6) returned error from glimpse AST (req@ffff8f21b2996600 x1636761361900400 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f24e2afee40/0x5d9ee6c3b52dcfa6 lrc: 4/0,0 mode: PW/PW res: [0x20002a02c:0x7:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x40200000000000 nid: 10.8.30.2@o2ib6 remote: 0xd8b3a8b223b5ecbe expref: 168 pid: 24577 timeout: 0 lvb_type: 0 Aug 10 20:49:59 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.30.2@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Aug 10 20:49:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 19035s: evicting client at 10.8.30.2@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f24e2afee40/0x5d9ee6c3b52dcfa6 lrc: 4/0,0 mode: PW/PW res: [0x20002a02c:0x7:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x40200000000000 nid: 10.8.30.2@o2ib6 remote: 0xd8b3a8b223b5ecbe expref: 169 pid: 24577 timeout: 0 lvb_type: 0 Aug 10 20:49:59 fir-md1-s1 kernel: Lustre: 24580:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:85s); client may timeout. req@ffff8f244d379200 x1635094446959888/t0(0) o101->dac1b890-e51e-7605-0de6-9aadda9d8c58@10.9.109.24@o2ib4:4/0 lens 480/536 e 1 to 0 dl 1565495314 ref 1 fl Complete:/0/0 rc 301/301 Aug 10 20:50:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 407237ae-447c-6768-2312-ccc095df731a (at 10.8.30.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae81c6000, cur 1565495417 expire 1565495267 last 1565495190 Aug 10 20:50:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 20:51:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 20:51:05 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 10 20:51:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 20:51:05 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 20:52:35 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 10 21:00:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:01:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 21:01:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 21:01:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 21:01:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 10 21:04:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:04:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:07:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:07:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 10 21:08:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:09:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 21:11:32 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 21:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 21:11:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 21:14:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:14:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 21:14:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:17:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:19:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:20:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:21:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:21:28 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 21:21:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 21:21:58 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 21:22:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 21:22:36 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 21:23:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 21:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 21:32:17 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 21:32:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:32:44 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 10 21:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 21:34:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 21:38:19 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 10 21:42:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 21:42:53 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 21:43:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 21:43:02 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 21:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 21:44:31 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 21:53:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 21:53:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 21:54:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 21:54:41 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 10 21:56:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 21:56:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 22:03:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 22:03:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 22:06:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 22:06:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 22:07:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 22:07:15 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 22:14:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 22:14:05 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 10 22:16:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 22:16:57 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 22:17:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 22:17:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 10 22:22:31 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565500944/real 1565500944] req@ffff8f2c2af16f00 x1636761397496448/t0(0) o106->fir-MDT0000@10.8.26.28@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565500951 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 10 22:22:31 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 10 22:22:45 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565500958/real 1565500958] req@ffff8f2c2af16f00 x1636761397496448/t0(0) o106->fir-MDT0000@10.8.26.28@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565500965 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 22:22:45 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 10 22:22:49 fir-md1-s1 kernel: Lustre: 21380:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f29f605c800 x1631713633350032/t0(0) o101->0ee78617-a93b-a65f-97e4-7caa3e6fe676@10.8.25.14@o2ib6:24/0 lens 480/568 e 0 to 0 dl 1565500974 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 22:23:06 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565500979/real 1565500979] req@ffff8f2c2af16f00 x1636761397496448/t0(0) o106->fir-MDT0000@10.8.26.28@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565500986 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 22:23:06 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 10 22:23:48 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565501021/real 1565501021] req@ffff8f2c2af16f00 x1636761397496448/t0(0) o106->fir-MDT0000@10.8.26.28@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565501028 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 22:23:48 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 10 22:24:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 22:24:12 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 10 22:25:05 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565501098/real 1565501098] req@ffff8f2c2af16f00 x1636761397496448/t0(0) o106->fir-MDT0000@10.8.26.28@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565501105 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 22:25:05 fir-md1-s1 kernel: Lustre: 23759:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Aug 10 22:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b53d848e-794d-18dd-d954-82958d426824 (at 10.8.26.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fc74fac00, cur 1565501107 expire 1565500957 last 1565500880 Aug 10 22:25:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 22:25:07 fir-md1-s1 kernel: Lustre: 23759:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:133s); client may timeout. req@ffff8f29f605c800 x1631713633350032/t0(0) o101->0ee78617-a93b-a65f-97e4-7caa3e6fe676@10.8.25.14@o2ib6:24/0 lens 480/536 e 0 to 0 dl 1565500974 ref 1 fl Complete:/0/0 rc 301/301 Aug 10 22:25:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b53d848e-794d-18dd-d954-82958d426824 (at 10.8.26.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2817e98c00, cur 1565501122 expire 1565500972 last 1565500895 Aug 10 22:27:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 22:27:24 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 22:27:32 fir-md1-s1 kernel: Lustre: 21446:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f24e1540900 x1635103426851920/t0(0) o101->b16e4006-ad8f-de37-ede7-21e0aff43fcc@10.8.1.3@o2ib6:7/0 lens 1792/3288 e 0 to 0 dl 1565501257 ref 2 fl Interpret:/0/0 rc 0/0 Aug 10 22:27:35 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565501248/real 1565501248] req@ffff8f175650a400 x1636761399716480/t0(0) o104->fir-MDT0000@10.8.30.10@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565501255 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 10 22:27:35 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 10 22:28:45 fir-md1-s1 kernel: LustreError: 97639:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.30.10@o2ib6) returned error from blocking AST (req@ffff8f175650a400 x1636761399716480 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2d4be13f00/0x5d9ee6c4184273f6 lrc: 4/0,0 mode: PR/PR res: [0x200029db1:0x53:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.30.10@o2ib6 remote: 0x1e5756fca3f38792 expref: 839 pid: 23678 timeout: 4616534 lvb_type: 0 Aug 10 22:28:45 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.30.10@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Aug 10 22:28:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 98s: evicting client at 10.8.30.10@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2d4be13f00/0x5d9ee6c4184273f6 lrc: 3/0,0 mode: PR/PR res: [0x200029db1:0x53:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.30.10@o2ib6 remote: 0x1e5756fca3f38792 expref: 840 pid: 23678 timeout: 0 lvb_type: 0 Aug 10 22:29:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 940f13c1-9d44-e6c4-c0f7-e72e85e7eee1 (at 10.8.30.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f301b2c3c00, cur 1565501370 expire 1565501220 last 1565501143 Aug 10 22:29:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 22:29:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 22:29:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 22:34:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 22:34:51 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 10 22:38:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 22:38:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 22:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 22:45:09 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 22:45:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 22:45:17 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 10 22:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 22:48:22 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 10 22:55:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 22:55:41 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 22:58:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 22:58:26 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 10 22:58:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 22:58:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 10 23:06:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 23:06:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 23:08:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 23:08:32 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 23:11:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 23:11:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 23:15:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 23:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 23:16:22 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 10 23:18:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 10 23:18:33 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 10 23:23:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 23:23:45 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 10 23:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 23:26:33 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 23:26:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd982697-484c-b715-2ef9-0e798e95e3ed (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d20099400, cur 1565504812 expire 1565504662 last 1565504585 Aug 10 23:26:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 23:27:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fd982697-484c-b715-2ef9-0e798e95e3ed (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17b701b000, cur 1565504823 expire 1565504673 last 1565504596 Aug 10 23:27:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 10 23:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 23:29:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 10 23:30:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 10 23:34:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 10 23:34:32 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 10 23:36:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 23:36:38 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 10 23:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 23:40:52 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 10 23:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 10 23:46:51 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 10 23:47:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 23:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 10 23:51:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 10 23:57:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 10 23:57:30 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 10 23:57:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 10 23:57:30 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 11 00:02:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 00:02:08 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 00:07:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 00:07:46 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 00:14:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 00:14:03 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 00:15:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 00:15:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 00:17:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 00:17:56 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 00:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 00:24:46 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 00:28:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 00:28:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 00:30:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 00:30:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 00:31:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 00:36:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 00:36:28 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 00:40:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 00:40:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 00:41:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 00:41:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 00:46:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 00:46:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 00:50:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 00:50:58 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 00:52:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 00:52:47 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 00:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 00:57:36 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 01:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 01:01:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 01:05:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 01:05:34 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 01:09:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 01:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 01:09:48 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 01:11:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 01:11:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 01:16:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 01:16:01 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 01:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 01:19:52 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 01:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 01:21:36 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 01:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 01:26:14 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 01:30:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 01:30:09 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 01:31:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 01:31:44 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 01:37:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 01:37:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 01:40:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 01:40:26 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 01:41:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 01:41:59 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 11 01:50:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 01:50:59 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 01:51:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 01:51:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 01:52:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 01:52:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 02:01:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 02:01:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 02:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 02:01:55 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 02:02:42 fir-md1-s1 kernel: Lustre: 23571:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f13f0564500 x1641227131862576/t0(0) o36->f7504a0d-490a-d58a-1f75-439227e99fde@10.9.104.27@o2ib4:17/0 lens 496/2888 e 0 to 0 dl 1565514167 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 02:02:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.10.21@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e5db6b600/0x5d9ee6c44368864f lrc: 3/0,0 mode: CR/CR res: [0x20002a022:0x11:0x0].0x0 bits 0x9/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.10.21@o2ib6 remote: 0x3771db9f873e2cbb expref: 197 pid: 97662 timeout: 4629226 lvb_type: 0 Aug 11 02:02:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 02:02:53 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 02:12:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 02:12:05 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 02:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 02:12:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 02:15:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 02:15:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 02:22:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 02:22:11 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 02:24:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 02:24:02 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 02:25:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 02:25:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 02:29:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 02:32:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 02:32:27 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 02:34:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 02:34:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 02:36:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 02:36:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 02:42:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 02:42:40 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 02:42:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 02:44:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 02:44:30 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 02:50:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 02:50:46 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 02:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 02:53:12 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 02:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 02:54:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 03:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 03:03:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 03:04:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 03:04:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 03:05:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 03:05:19 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 03:14:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 03:14:04 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 03:15:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 03:15:54 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 03:15:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 03:15:54 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 03:25:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 03:25:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 03:26:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 03:26:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 03:26:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 03:26:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 03:31:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 03:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 03:36:01 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 03:36:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 03:36:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 03:38:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 03:38:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 03:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 03:48:06 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 03:48:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 03:48:06 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 11 03:49:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 03:49:41 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 03:58:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 03:58:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 11 03:59:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 03:59:03 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 04:00:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 04:00:28 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 04:01:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 04:08:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 04:08:50 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 04:09:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 04:09:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 04:19:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 04:19:39 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 04:19:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 04:19:39 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 04:21:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 04:21:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 04:30:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 04:30:22 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 04:30:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 04:30:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 04:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 04:31:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 04:35:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 04:40:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 04:40:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 04:41:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 04:41:23 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 04:42:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 04:42:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 04:42:56 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 04:50:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 04:50:38 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 04:51:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 04:51:24 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 04:56:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 04:56:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 05:01:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 05:02:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 05:02:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 05:02:05 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 05:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 05:02:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 05:12:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 05:12:27 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 05:12:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 05:12:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 05:13:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 05:13:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 05:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 05:23:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 05:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 05:23:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 05:24:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 05:24:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 05:33:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 05:33:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 05:33:34 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 05:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 05:34:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 05:35:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 05:35:55 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 05:45:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 05:45:06 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 05:46:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 05:46:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 05:48:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 05:48:08 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 05:55:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 05:55:24 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 05:56:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 05:56:18 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 06:02:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 06:02:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 06:06:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 06:06:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 06:06:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 06:06:21 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 06:12:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 06:12:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 06:16:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 06:16:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 06:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 06:17:16 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 06:23:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 06:23:46 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 06:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 06:27:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 06:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 06:27:28 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 06:36:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 06:37:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 06:37:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 06:37:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 06:37:32 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 11 06:38:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 06:38:22 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 06:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 06:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 06:47:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 11 06:47:34 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 06:48:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 06:48:34 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 06:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 06:57:56 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 06:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 06:57:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 06:58:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 06:58:35 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 07:03:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 07:04:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 07:05:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 07:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 07:08:48 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 07:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 07:08:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 07:10:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 07:10:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 07:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 07:19:01 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 07:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 07:19:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 07:23:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 07:23:33 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 07:29:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 07:29:02 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 07:29:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 07:29:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 07:35:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 07:35:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 07:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 07:39:04 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 07:39:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 07:39:04 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 07:46:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 07:46:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 07:48:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 07:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 07:49:33 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 07:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 07:49:33 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 07:58:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 07:59:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 07:59:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 07:59:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 07:59:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 08:04:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 08:04:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 08:09:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 08:09:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 08:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 08:09:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 08:14:12 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:14:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.2@o2ib6, removing former export from same NID Aug 11 08:14:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 08:14:32 fir-md1-s1 kernel: LustreError: 46535:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f25313e9050 x1631778750927056/t0(0) o3->cb1e051f-12ef-c393-c1de-bc60ba01debc@10.8.13.11@o2ib6:14/0 lens 488/440 e 1 to 0 dl 1565536484 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:34 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2d0ec77850 x1639153800871600/t0(0) o3->add72913-db86-c787-0af8-87baea28d190@10.8.30.3@o2ib6:13/0 lens 488/440 e 1 to 0 dl 1565536483 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with add72913-db86-c787-0af8-87baea28d190 (at 10.8.30.3@o2ib6), client will retry: rc -110 Aug 11 08:14:36 fir-md1-s1 kernel: Lustre: 20725:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565536469/real 0] req@ffff8f21de5d9e00 x1636761605943728/t0(0) o104->fir-MDT0002@10.8.8.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565536476 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 08:14:36 fir-md1-s1 kernel: Lustre: 20725:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Aug 11 08:14:37 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:14:37 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21f3c28c00 Aug 11 08:14:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with cb1e051f-12ef-c393-c1de-bc60ba01debc (at 10.8.13.11@o2ib6), client will retry: rc -110 Aug 11 08:14:43 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d37050 x1631776730219104/t0(0) o3->3ef40adc-26c8-dedb-8bee-f48e96b9a452@10.8.24.9@o2ib6:22/0 lens 488/440 e 1 to 0 dl 1565536492 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:43 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 11 08:14:44 fir-md1-s1 kernel: Lustre: 49475:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f2eeb5c50 x1631699925040784/t0(0) o3->1e8725e1-8b53-4b80-244d-97fadcc45330@10.8.21.27@o2ib6:19/0 lens 504/440 e 1 to 0 dl 1565536489 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:45 fir-md1-s1 kernel: Lustre: 21460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f187a7db900 x1631451911998240/t0(0) o36->830961d5-ed21-b9fb-32a0-0946e2dc853d@10.8.24.6@o2ib6:20/0 lens 488/3152 e 1 to 0 dl 1565536490 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:47 fir-md1-s1 kernel: Lustre: 48198:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2959d37050 x1631776730219104/t0(0) o3->3ef40adc-26c8-dedb-8bee-f48e96b9a452@10.8.24.9@o2ib6:22/0 lens 488/440 e 1 to 0 dl 1565536492 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:53 fir-md1-s1 kernel: LustreError: 6550:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2f2eeb0850 x1641511768841168/t0(0) o3->aa945180-3c21-1084-1313-f108e586f772@10.8.25.24@o2ib6:28/0 lens 488/440 e 1 to 0 dl 1565536498 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:53 fir-md1-s1 kernel: Lustre: 48198:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f2eeb0850 x1641511768841168/t0(0) o3->aa945180-3c21-1084-1313-f108e586f772@10.8.25.24@o2ib6:28/0 lens 488/440 e 1 to 0 dl 1565536498 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:56 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:14:56 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f11f47d4a00 Aug 11 08:14:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1e8725e1-8b53-4b80-244d-97fadcc45330 (at 10.8.21.27@o2ib6), client will retry: rc -110 Aug 11 08:14:56 fir-md1-s1 kernel: Lustre: 49478:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:7s); client may timeout. req@ffff8f2f2eeb5c50 x1631699925040784/t0(0) o3->1e8725e1-8b53-4b80-244d-97fadcc45330@10.8.21.27@o2ib6:19/0 lens 504/440 e 1 to 0 dl 1565536489 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:14:57 fir-md1-s1 kernel: LustreError: 21293:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1efd698050 x1631810224003072/t0(0) o3->13061d85-51ac-4b0f-0a27-af4e7a3825e8@10.8.22.3@o2ib6:1/0 lens 488/440 e 1 to 0 dl 1565536501 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:57 fir-md1-s1 kernel: LustreError: 21293:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Aug 11 08:14:59 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:14:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b15c7de00 Aug 11 08:14:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with cd075b36-33db-5052-abc8-0d1d7f478890 (at 10.8.30.8@o2ib6), client will retry: rc -110 Aug 11 08:14:59 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f2eeb2450 x1631712705087712/t0(0) o3->6ca4f333-5bde-fc33-1d5d-b16315817b8a@10.8.10.26@o2ib6:4/0 lens 504/440 e 1 to 0 dl 1565536504 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:14:59 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 11 08:15:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eaea51e00 Aug 11 08:15:00 fir-md1-s1 kernel: Lustre: 22181:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f2959d37050 x1631776730219104/t0(0) o3->3ef40adc-26c8-dedb-8bee-f48e96b9a452@10.8.24.9@o2ib6:22/0 lens 488/440 e 1 to 0 dl 1565536492 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:15:06 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:15:06 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 11 08:15:06 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20c0046c00 Aug 11 08:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 43d4c491-0147-e9a3-8154-08fbbbab65ce (at 10.8.25.11@o2ib6), client will retry: rc -110 Aug 11 08:15:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 08:15:06 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29ae1e6400 Aug 11 08:15:06 fir-md1-s1 kernel: Lustre: 6550:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f2f2eeb0850 x1641511768841168/t0(0) o3->aa945180-3c21-1084-1313-f108e586f772@10.8.25.24@o2ib6:28/0 lens 488/440 e 1 to 0 dl 1565536498 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:15:06 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d31c50 x1641511904862800/t0(0) o3->eea156e1-84b6-e6c2-2cc1-e5d6774978c8@10.8.30.27@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1565536521 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:06 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 11 08:15:10 fir-md1-s1 kernel: Lustre: 46581:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f25313ec450 x1631684847227968/t0(0) o3->88ec999f-c6f4-0281-c377-b70d1594553b@10.8.12.29@o2ib6:15/0 lens 488/440 e 1 to 0 dl 1565536515 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:10 fir-md1-s1 kernel: Lustre: 46581:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 11 08:15:11 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f199afe3c00 Aug 11 08:15:11 fir-md1-s1 kernel: Lustre: 21293:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:10s); client may timeout. req@ffff8f1efd698050 x1631810224003072/t0(0) o3->13061d85-51ac-4b0f-0a27-af4e7a3825e8@10.8.22.3@o2ib6:1/0 lens 488/440 e 1 to 0 dl 1565536501 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:15:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:15:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 11 08:15:15 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32d39a4800 Aug 11 08:15:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 6ca4f333-5bde-fc33-1d5d-b16315817b8a (at 10.8.10.26@o2ib6), client will retry: rc -110 Aug 11 08:15:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 08:15:15 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2af8c41e00 Aug 11 08:15:16 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20e26afc00 Aug 11 08:15:20 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f223875f000 Aug 11 08:15:20 fir-md1-s1 kernel: Lustre: 46535:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:5s); client may timeout. req@ffff8f25313ec450 x1631684847227968/t0(0) o3->88ec999f-c6f4-0281-c377-b70d1594553b@10.8.12.29@o2ib6:15/0 lens 488/440 e 1 to 0 dl 1565536515 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:15:20 fir-md1-s1 kernel: Lustre: 46535:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 11 08:15:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 55s: evicting client at 10.8.8.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2988b21200/0x5d9ee6c4790a6f20 lrc: 3/0,0 mode: PR/PR res: [0x2c002c85f:0x14547:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3ed3a90a84 expref: 17735 pid: 21415 timeout: 4651584 lvb_type: 0 Aug 11 08:15:25 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34e1e3ba00 Aug 11 08:15:25 fir-md1-s1 kernel: LustreError: 20725:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2b4ce6d400 ns: mdt-fir-MDT0002_UUID lock: ffff8f24bfad3f00/0x5d9ee6c4797bcb13 lrc: 3/0,0 mode: EX/EX res: [0x2c002c85f:0x14547:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x50000000000000 nid: 10.8.8.31@o2ib6 remote: 0x4d059c3ed3a96b50 expref: 12270 pid: 20725 timeout: 0 lvb_type: 3 Aug 11 08:15:30 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d31850 x1638720273557168/t0(0) o3->9d6ae479-a842-fc0f-dbde-e06192ae8a5e@10.8.13.27@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1565536551 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:30 fir-md1-s1 kernel: LustreError: 46516:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Aug 11 08:15:30 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f345bab6000 Aug 11 08:15:30 fir-md1-s1 kernel: Lustre: 22181:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f2959d31c50 x1641511904862800/t0(0) o3->eea156e1-84b6-e6c2-2cc1-e5d6774978c8@10.8.30.27@o2ib6:21/0 lens 488/440 e 1 to 0 dl 1565536521 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:15:31 fir-md1-s1 kernel: Lustre: 22181:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Aug 11 08:15:31 fir-md1-s1 kernel: Lustre: 46535:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d06737850 x1631807273560096/t0(0) o3->df4f5b31-9da9-6b7b-4719-3abada4a7973@10.8.23.31@o2ib6:6/0 lens 488/440 e 1 to 0 dl 1565536536 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:31 fir-md1-s1 kernel: Lustre: 46535:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 11 08:15:36 fir-md1-s1 kernel: LustreError: 21293:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f1d06737850 x1631807273560096/t0(0) o3->df4f5b31-9da9-6b7b-4719-3abada4a7973@10.8.23.31@o2ib6:6/0 lens 488/440 e 1 to 0 dl 1565536536 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:41 fir-md1-s1 kernel: LustreError: 49462:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2e227b7850 x1634521793678464/t0(0) o3->3fc7f286-0df6-b862-cc08-00139bfcf834@10.8.27.20@o2ib6:11/0 lens 488/440 e 0 to 0 dl 1565536541 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:47 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f25313ec450 x1631709102110048/t0(0) o3->f8938193-b6f4-691f-a9ed-5d03b37d98de@10.8.30.11@o2ib6:17/0 lens 488/440 e 1 to 0 dl 1565536547 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:49 fir-md1-s1 kernel: LustreError: 49475:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2959d30450 x1631594671328880/t0(0) o3->d1612639-ba09-5523-fd87-6391497129b4@10.8.18.19@o2ib6:19/0 lens 488/440 e 0 to 0 dl 1565536549 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:15:49 fir-md1-s1 kernel: LustreError: 49475:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Aug 11 08:15:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 58s: evicting client at 10.8.20.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f3355e2bcc0/0x5d9ee6c4792f25bf lrc: 4/0,0 mode: PR/PR res: [0x20002983e:0x1447b:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.20.14@o2ib6 remote: 0x1090d60366369949 expref: 17 pid: 20725 timeout: 4651618 lvb_type: 0 Aug 11 08:16:12 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f19659f7050 x1635621526900416/t0(0) o3->d072205a-1b1b-636c-7696-e9d92af1edee@10.8.20.3@o2ib6:17/0 lens 488/440 e 0 to 0 dl 1565536577 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:16:12 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 11 08:16:16 fir-md1-s1 kernel: LustreError: 49478:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d35450 x1631951338886992/t0(0) o3->ca76b195-822f-9abb-2230-05894a3e9cc7@10.8.30.7@o2ib6:1/0 lens 488/440 e 1 to 0 dl 1565536591 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:16:16 fir-md1-s1 kernel: LustreError: 49478:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 11 08:16:41 fir-md1-s1 kernel: LustreError: 49463:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2d0ec77850 x1634275718136032/t0(0) o3->9eb88991-51bc-5034-bc55-1b8fa8295e05@10.8.27.24@o2ib6:11/0 lens 488/440 e 0 to 0 dl 1565536601 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:17:02 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565536615/real 0] req@ffff8f2c2bf82a00 x1636761606613104/t0(0) o104->fir-MDT0000@10.8.1.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565536622 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 08:17:02 fir-md1-s1 kernel: Lustre: 23455:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 11 08:17:20 fir-md1-s1 kernel: Lustre: 21460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1e2f4c8600 x1631572574689152/t0(0) o101->b1560181-32d0-3000-87fb-1969e5df2f5e@10.9.101.68@o2ib4:25/0 lens 1808/3288 e 0 to 0 dl 1565536645 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:17:20 fir-md1-s1 kernel: Lustre: 21460:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 11 08:17:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.1.29@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f28fd8d72c0/0x5d9ee6c47213752e lrc: 4/0,0 mode: PR/PR res: [0x200029dc6:0xe8ba:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.1.29@o2ib6 remote: 0x10b8451183747e24 expref: 18 pid: 21415 timeout: 4651704 lvb_type: 0 Aug 11 08:17:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.18.19@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e35fc3a80/0x5d9ee6c4769d596d lrc: 4/0,0 mode: PR/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 27 type: IBT flags: 0x60200400000020 nid: 10.8.18.19@o2ib6 remote: 0xfc01738500ab1fc1 expref: 334 pid: 21460 timeout: 4651710 lvb_type: 0 Aug 11 08:17:42 fir-md1-s1 kernel: LustreError: 22181:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d32450 x1634275718888112/t0(0) o3->9eb88991-51bc-5034-bc55-1b8fa8295e05@10.8.27.24@o2ib6:10/0 lens 488/440 e 0 to 0 dl 1565536690 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:18:21 fir-md1-s1 kernel: LNet: Service thread pid 23666 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 08:18:21 fir-md1-s1 kernel: Pid: 23666, comm: mdt03_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 08:18:21 fir-md1-s1 kernel: Call Trace: Aug 11 08:18:21 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 08:18:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 08:18:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 08:18:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 08:18:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536701.23666 Aug 11 08:18:31 fir-md1-s1 kernel: LNet: Service thread pid 49462 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 08:18:31 fir-md1-s1 kernel: Pid: 49462, comm: mdt_io02_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 08:18:31 fir-md1-s1 kernel: Call Trace: Aug 11 08:18:31 fir-md1-s1 kernel: [] ptlrpc_abort_bulk+0x252/0x350 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] target_bulk_io+0x6ad/0xab0 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] tgt_brw_read+0xcbd/0x1e50 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 08:18:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 08:18:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 08:18:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 08:18:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536711.49462 Aug 11 08:18:32 fir-md1-s1 kernel: LustreError: 21428:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565536622, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f14e301b3c0/0x5d9ee6c479a0f563 lrc: 3/1,0 mode: --/PR res: [0x200029fd6:0x2bc:0x0].0x0 bits 0x13/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21428 timeout: 0 lvb_type: 0 Aug 11 08:18:32 fir-md1-s1 kernel: LustreError: 21428:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 11 08:18:36 fir-md1-s1 kernel: LNet: Service thread pid 21293 was inactive for 200.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 08:18:36 fir-md1-s1 kernel: Pid: 21293, comm: mdt_io01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 08:18:36 fir-md1-s1 kernel: Call Trace: Aug 11 08:18:36 fir-md1-s1 kernel: [] ptlrpc_abort_bulk+0x252/0x350 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] target_bulk_io+0x6ad/0xab0 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] tgt_brw_read+0xcbd/0x1e50 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 08:18:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 08:18:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 08:18:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 08:18:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536716.21293 Aug 11 08:18:39 fir-md1-s1 kernel: LNet: Service thread pid 46510 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 08:18:39 fir-md1-s1 kernel: Pid: 46510, comm: mdt_io02_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 08:18:39 fir-md1-s1 kernel: Call Trace: Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_abort_bulk+0x252/0x350 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] target_bulk_io+0x6ad/0xab0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] tgt_brw_read+0xcbd/0x1e50 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 08:18:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 08:18:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 08:18:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536719.46510 Aug 11 08:18:39 fir-md1-s1 kernel: Pid: 49475, comm: mdt_io02_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 08:18:39 fir-md1-s1 kernel: Call Trace: Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_abort_bulk+0x252/0x350 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] target_bulk_io+0x6ad/0xab0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] tgt_brw_read+0xcbd/0x1e50 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 08:18:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 08:18:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 08:18:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 08:18:41 fir-md1-s1 kernel: LNet: Service thread pid 46516 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:18:41 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Aug 11 08:18:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536721.46516 Aug 11 08:18:48 fir-md1-s1 kernel: LNet: Service thread pid 46581 was inactive for 200.20s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:18:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536728.46581 Aug 11 08:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e6770383-b71d-26bd-2ffa-8df05e7f3814 (at 10.8.9.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e5fd4c000, cur 1565536740 expire 1565536590 last 1565536513 Aug 11 08:19:07 fir-md1-s1 kernel: LNet: Service thread pid 21712 was inactive for 200.43s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:19:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536747.21712 Aug 11 08:19:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c3d06e43-ba14-2103-7ba5-2c78b01fb285 (at 10.8.9.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f248036bc00, cur 1565536750 expire 1565536600 last 1565536523 Aug 11 08:19:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 08:19:31 fir-md1-s1 kernel: LNet: Service thread pid 49478 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:19:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536771.49478 Aug 11 08:19:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536772.49463 Aug 11 08:19:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 084aa447-2226-ef62-c8a3-d90ca03f5dde (at 10.8.16.8@o2ib6) Aug 11 08:19:43 fir-md1-s1 kernel: Lustre: Skipped 9018 previous similar messages Aug 11 08:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0556c970-f843-3f2b-1d1c-0b06884878cc (at 10.8.13.22@o2ib6) reconnecting Aug 11 08:19:52 fir-md1-s1 kernel: Lustre: Skipped 5919 previous similar messages Aug 11 08:20:16 fir-md1-s1 kernel: LNet: Service thread pid 23455 was inactive for 200.53s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:20:16 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 11 08:20:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536816.23455 Aug 11 08:20:21 fir-md1-s1 kernel: LustreError: 20734:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565536731, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f6225e0c0/0x5d9ee6c479b742a0 lrc: 3/1,0 mode: --/PR res: [0x20002983e:0x1447b:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20734 timeout: 0 lvb_type: 0 Aug 11 08:20:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536822.21428 Aug 11 08:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6baac375-c077-0a49-17c4-c5d9cfead043 (at 10.8.20.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250a6bbc00, cur 1565536824 expire 1565536674 last 1565536597 Aug 11 08:20:30 fir-md1-s1 kernel: Lustre: 46516:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f2dd646fc00 Aug 11 08:20:36 fir-md1-s1 kernel: Lustre: 21293:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f1abbd5d200 Aug 11 08:20:41 fir-md1-s1 kernel: Lustre: 49462:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f2eac678c00 Aug 11 08:20:43 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:20:43 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 11 08:20:47 fir-md1-s1 kernel: Lustre: 46581:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f1e44fd9a00 Aug 11 08:20:53 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2eac678c00 Aug 11 08:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3fc7f286-0df6-b862-cc08-00139bfcf834 (at 10.8.27.20@o2ib6), client will retry: rc -110 Aug 11 08:20:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 08:20:53 fir-md1-s1 kernel: Lustre: 49462:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:312s); client may timeout. req@ffff8f2e227b7850 x1634521793678464/t0(0) o3->3fc7f286-0df6-b862-cc08-00139bfcf834@10.8.27.20@o2ib6:11/0 lens 488/440 e 0 to 0 dl 1565536541 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:20:53 fir-md1-s1 kernel: LNet: Service thread pid 49462 completed after 342.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:20:53 fir-md1-s1 kernel: LNet: Skipped 60 previous similar messages Aug 11 08:21:00 fir-md1-s1 kernel: Lustre: 21712:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f1ef49ef800 Aug 11 08:21:00 fir-md1-s1 kernel: Lustre: 21712:0:(niobuf.c:303:ptlrpc_abort_bulk()) Skipped 2 previous similar messages Aug 11 08:21:00 fir-md1-s1 kernel: LNet: Service thread pid 22181 was inactive for 200.12s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:21:00 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 11 08:21:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536860.22181 Aug 11 08:21:16 fir-md1-s1 kernel: Lustre: 49478:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f3080d95600 Aug 11 08:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d1612639-ba09-5523-fd87-6391497129b4 (at 10.8.18.19@o2ib6) in 201 seconds. I think it's dead, and I am evicting it. exp ffff8f1c43df0000, cur 1565536900 expire 1565536750 last 1565536699 Aug 11 08:21:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 08:21:41 fir-md1-s1 kernel: Lustre: 49463:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f2c7b01ce00 Aug 11 08:22:11 fir-md1-s1 kernel: LNet: Service thread pid 20734 was inactive for 200.32s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 08:22:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565536931.20734 Aug 11 08:22:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:22:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 11 08:22:12 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1abbd5d200 Aug 11 08:22:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with df4f5b31-9da9-6b7b-4719-3abada4a7973 (at 10.8.23.31@o2ib6), client will retry: rc -110 Aug 11 08:22:12 fir-md1-s1 kernel: Lustre: 21293:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:396s); client may timeout. req@ffff8f1d06737850 x1631807273560096/t0(0) o3->df4f5b31-9da9-6b7b-4719-3abada4a7973@10.8.23.31@o2ib6:6/0 lens 488/440 e 1 to 0 dl 1565536536 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:22:12 fir-md1-s1 kernel: LNet: Service thread pid 21293 completed after 416.25s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:22:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c6e7a245-976c-f1da-2930-5dafca10acda (at 10.8.31.8@o2ib6) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f2521041000, cur 1565536936 expire 1565536786 last 1565536729 Aug 11 08:22:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 08:22:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 30914fa9-f16d-1c3c-9f79-80d10d6d2efb (at 10.8.25.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d3928c00, cur 1565536953 expire 1565536803 last 1565536726 Aug 11 08:22:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 08:22:42 fir-md1-s1 kernel: Lustre: 22181:0:(niobuf.c:303:ptlrpc_abort_bulk()) Unexpectedly long timeout: desc ffff8f311abd8a00 Aug 11 08:22:46 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565536959/real 0] req@ffff8f2c2bf81800 x1636761608491232/t0(0) o104->fir-MDT0000@10.8.30.27@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565536966 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 08:22:46 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 11 08:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aa060d18-c2f6-ae26-8540-50f8ac146bb2 (at 10.8.22.36@o2ib6) in 204 seconds. I think it's dead, and I am evicting it. exp ffff8f2522b46c00, cur 1565536976 expire 1565536826 last 1565536772 Aug 11 08:22:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 08:23:04 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20fbbb1200 x1638784693535392/t0(0) o101->3ac1581a-a94e-22b3-2bf3-b18d4bc33b46@10.9.104.26@o2ib4:9/0 lens 1792/3288 e 0 to 0 dl 1565536989 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:23:04 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 11 08:23:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 08:23:07 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 11 08:23:08 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f30f047be00 Aug 11 08:23:08 fir-md1-s1 kernel: LNet: Service thread pid 46510 completed after 469.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:23:21 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:23:21 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 11 08:23:21 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f321f5da400 Aug 11 08:23:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with d1612639-ba09-5523-fd87-6391497129b4 (at 10.8.18.19@o2ib6), client will retry: rc -110 Aug 11 08:23:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 08:23:22 fir-md1-s1 kernel: Lustre: 49475:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:453s); client may timeout. req@ffff8f2959d30450 x1631594671328880/t0(0) o3->d1612639-ba09-5523-fd87-6391497129b4@10.8.18.19@o2ib6:19/0 lens 488/440 e 0 to 0 dl 1565536549 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:23:22 fir-md1-s1 kernel: Lustre: 49475:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 11 08:23:22 fir-md1-s1 kernel: LNet: Service thread pid 49475 completed after 482.12s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:23:44 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2dd646fc00 Aug 11 08:23:44 fir-md1-s1 kernel: LNet: Service thread pid 46516 completed after 502.51s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:23:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 20e9a096-1550-beb2-ef57-b09d94252e63 (at 10.8.31.3@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff8f2523409800, cur 1565537025 expire 1565536875 last 1565536802 Aug 11 08:24:10 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e44fd9a00 Aug 11 08:24:10 fir-md1-s1 kernel: LNet: Service thread pid 46581 completed after 522.50s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:24:18 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ef49ef800 Aug 11 08:24:20 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3080d95600 Aug 11 08:24:21 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2c7b01ce00 Aug 11 08:24:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.3@o2ib6, removing former export from same NID Aug 11 08:24:27 fir-md1-s1 kernel: Lustre: Skipped 5171 previous similar messages Aug 11 08:24:28 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f311abd8a00 Aug 11 08:24:28 fir-md1-s1 kernel: LNet: Service thread pid 22181 completed after 408.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 08:24:28 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Aug 11 08:24:37 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.31.3@o2ib6 arrived at 1565537077 with bad export cookie 6746082289091915459 Aug 11 08:24:37 fir-md1-s1 kernel: LustreError: 22891:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages Aug 11 08:24:55 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1d06732050 x1631776730219104/t0(0) o3->3ef40adc-26c8-dedb-8bee-f48e96b9a452@10.8.24.9@o2ib6:6/0 lens 488/440 e 1 to 0 dl 1565537106 ref 1 fl Interpret:/2/0 rc 0/0 Aug 11 08:25:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 87d4e5a2-28f4-1320-78c3-d40b179c856f (at 10.8.12.11@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f250174f800, cur 1565537101 expire 1565536951 last 1565536877 Aug 11 08:25:01 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 08:25:15 fir-md1-s1 kernel: LustreError: 69435:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1d06731850 x1631323381464000/t0(0) o3->27d764b1-f6cd-d678-112d-32dca2473c6a@10.8.26.24@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1565537115 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:25:34 fir-md1-s1 kernel: LustreError: 24569:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2e227b5850 x1639198401519536/t0(0) o3->31f3bb9d-0475-0811-b0fe-ab79fcaad5b2@10.8.31.1@o2ib6:4/0 lens 488/440 e 0 to 0 dl 1565537134 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:25:34 fir-md1-s1 kernel: LustreError: 24569:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 15 previous similar messages Aug 11 08:25:36 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 08:25:36 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Aug 11 08:25:36 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f233236ac00 Aug 11 08:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 5384d722-d31f-f247-a0ce-385afe618694 (at 10.8.18.23@o2ib6), client will retry: rc -110 Aug 11 08:25:36 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 08:25:36 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:21s); client may timeout. req@ffff8f25313ea050 x1633757747533968/t0(0) o3->5384d722-d31f-f247-a0ce-385afe618694@10.8.18.23@o2ib6:15/0 lens 488/440 e 0 to 0 dl 1565537115 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 08:25:37 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Aug 11 08:25:37 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f251ffe6e00 Aug 11 08:25:42 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f26b9b4a800 Aug 11 08:25:42 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f18675ed000 Aug 11 08:25:43 fir-md1-s1 kernel: LustreError: 20465:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.25.1@o2ib6) failed to reply to blocking AST (req@ffff8f2b9853ce00 x1636761608789184 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2b83cfc140/0x5d9ee6c47a0e36a0 lrc: 4/0,0 mode: PW/PW res: [0x200029ad4:0x8d83:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.25.1@o2ib6 remote: 0xb537eab6f563a2ef expref: 18 pid: 21671 timeout: 4652176 lvb_type: 0 Aug 11 08:25:43 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2c43df6800 Aug 11 08:25:43 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.25.1@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 11 08:25:44 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f269d965a00 Aug 11 08:25:44 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ef3959000 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ef28aba00 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 21671:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.29.8@o2ib6) failed to reply to blocking AST (req@ffff8f2b9853d400 x1636761608793296 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f213b0a0480/0x5d9ee6c47a0e2892 lrc: 4/0,0 mode: PR/PR res: [0x200029790:0x5ba:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.29.8@o2ib6 remote: 0xfccaff928fcf563a expref: 3998 pid: 22279 timeout: 4652176 lvb_type: 0 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.29.8@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 58s: evicting client at 10.8.20.1@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f251fd8a880/0x5d9ee6c47a0e3668 lrc: 3/0,0 mode: PW/PW res: [0x200029a5a:0x326c:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.20.1@o2ib6 remote: 0x9158c0545fb1adf4 expref: 20 pid: 97651 timeout: 0 lvb_type: 0 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: 23692:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2520c20c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f2de57460c0/0x5d9ee6c47a100caa lrc: 3/0,0 mode: PR/PR res: [0x200029a5a:0x326c:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.20.1@o2ib6 remote: 0x9158c0545fb1ae33 expref: 4 pid: 23692 timeout: 0 lvb_type: 0 Aug 11 08:25:45 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 11 08:25:46 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3233206a00 Aug 11 08:25:46 fir-md1-s1 kernel: LustreError: 21671:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4122fb8400 ns: mdt-fir-MDT0000_UUID lock: ffff8f28b16c6540/0x5d9ee6c47a100af1 lrc: 3/0,0 mode: PW/PW res: [0x200029790:0x5ba:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x50200000000000 nid: 10.8.29.8@o2ib6 remote: 0xfccaff928fcf58d3 expref: 890 pid: 21671 timeout: 0 lvb_type: 0 Aug 11 08:25:46 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3269b42400 Aug 11 08:25:47 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f221c4b5800 Aug 11 08:25:48 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2d25d7d400 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e3160de00 Aug 11 08:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 470a6da9-6d35-ef8d-326a-515adf418037 (at 10.8.29.3@o2ib6), client will retry: rc = -110 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34e4569e00 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32633c7c00 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2949ed6c00 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1daa9f6400 Aug 11 08:25:49 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f1c912800 Aug 11 08:25:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 08:25:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2aad3e1200 Aug 11 08:25:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28c193e800 Aug 11 08:25:50 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2cbaae9600 Aug 11 08:25:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Aug 11 08:25:50 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28c193f400 Aug 11 08:25:51 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29f603d800 Aug 11 08:25:51 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f288b615a00 Aug 11 08:25:52 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2859b3e800 Aug 11 08:25:52 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f290669cc00 Aug 11 08:25:55 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f25086c7c00 Aug 11 08:25:57 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565537150/real 0] req@ffff8f2210725700 x1636761609231024/t0(0) o106->fir-MDT0000@10.8.22.18@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565537157 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 08:25:57 fir-md1-s1 kernel: Lustre: 24577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 11 08:25:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.9.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 08:25:59 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 11 08:26:08 fir-md1-s1 kernel: LustreError: 55536:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2d0cada850 x1631914194468032/t0(0) o256->9bef7816-9b4d-e96b-20a1-6edcd3b16fdb@10.8.30.34@o2ib6:8/0 lens 304/240 e 1 to 0 dl 1565537168 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 08:26:08 fir-md1-s1 kernel: LustreError: 55536:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Aug 11 08:27:20 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27f887ea00 Aug 11 08:27:24 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3003ce7500 x1641511567318016/t0(0) o101->6e65c769-1e57-0210-a51a-c6897929431a@10.9.114.8@o2ib4:29/0 lens 584/3264 e 0 to 0 dl 1565537249 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 08:27:24 fir-md1-s1 kernel: Lustre: 23603:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 58 previous similar messages Aug 11 08:27:29 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e93c55a00 Aug 11 08:27:37 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f166fb27600 Aug 11 08:28:18 fir-md1-s1 kernel: LustreError: 23666:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565537207, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3dafe0af40/0x5d9ee6c47a34a669 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23666 timeout: 0 lvb_type: 0 Aug 11 08:28:24 fir-md1-s1 kernel: LustreError: 20725:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565537214, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3124275c40/0x5d9ee6c47a366106 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 157 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20725 timeout: 0 lvb_type: 0 Aug 11 08:28:24 fir-md1-s1 kernel: LustreError: 20725:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 11 08:28:29 fir-md1-s1 kernel: LustreError: 20734:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565537219, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f26aeebda00/0x5d9ee6c47a383080 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 156 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20734 timeout: 0 lvb_type: 0 Aug 11 08:28:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c3d06e43-ba14-2103-7ba5-2c78b01fb285 (at 10.8.9.10@o2ib6) in 174 seconds. I think it's dead, and I am evicting it. exp ffff8f0e5a318400, cur 1565537315 expire 1565537165 last 1565537141 Aug 11 08:28:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 08:28:38 fir-md1-s1 kernel: LustreError: 20738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565537228, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f143127b3c0/0x5d9ee6c47a3b4b5b lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 157 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20738 timeout: 0 lvb_type: 0 Aug 11 08:28:38 fir-md1-s1 kernel: LustreError: 20738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ef20ad800 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e1c61b800 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1ef20ac400 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22baf00a00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22ae48ee00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a0ae56a00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f226e7ace00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2edd93e400 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3035e95000 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24f8885e00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a0ae57e00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1415452c00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24eb2cba00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2817fbea00 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f07c6ff1400 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0d1039d600 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f180d31c400 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 114s: evicting client at 10.8.12.35@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2522740d80/0x5d9ee6c47a24e822 lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 157 type: IBT flags: 0x60200400000020 nid: 10.8.12.35@o2ib6 remote: 0xe4208745d716cdbc expref: 171 pid: 23455 timeout: 4652295 lvb_type: 0 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3223449000 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 23581:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.9@o2ib6) failed to reply to blocking AST (req@ffff8f2cdf6d3c00 x1636761609580848 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f167ea1ba80/0x5d9ee6c478e6e343 lrc: 4/0,0 mode: PR/PR res: [0x200029b06:0x67:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.8.26.9@o2ib6 remote: 0xd8d4d87c636661f8 expref: 18 pid: 97662 timeout: 4652287 lvb_type: 0 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 23581:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.26.9@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 25078:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.12.5@o2ib6 arrived at 1565537320 with bad export cookie 6746082289101561515 Aug 11 08:28:40 fir-md1-s1 kernel: LustreError: 25078:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Aug 11 08:28:54 fir-md1-s1 kernel: LustreError: 23603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565537244, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2d25d43180/0x5d9ee6c47a3ff8a1 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 111 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23603 timeout: 0 lvb_type: 0 Aug 11 08:28:54 fir-md1-s1 kernel: LustreError: 23603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Aug 11 08:29:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.7.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f251fd8e780/0x5d9ee6c47a0e34bd lrc: 3/0,0 mode: PR/PR res: [0x20002983e:0x1447b:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.8.7.8@o2ib6 remote: 0xd9ff6091520cf15d expref: 17 pid: 97651 timeout: 4652409 lvb_type: 0 Aug 11 08:29:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 29 previous similar messages Aug 11 08:29:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2b7395d9-c40c-f531-147e-33ca0a08dcda (at 10.8.22.12@o2ib6) Aug 11 08:29:43 fir-md1-s1 kernel: Lustre: Skipped 13925 previous similar messages Aug 11 08:30:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 08:30:01 fir-md1-s1 kernel: Lustre: Skipped 9097 previous similar messages Aug 11 08:36:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 08:36:10 fir-md1-s1 kernel: Lustre: Skipped 2662 previous similar messages Aug 11 08:40:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 08:40:18 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 11 08:41:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 08:41:40 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 08:46:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 08:46:24 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 08:51:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 08:51:17 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 08:51:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 08:51:57 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 08:57:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 08:57:15 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 09:01:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 09:01:47 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 09:03:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 09:03:11 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 09:09:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 09:09:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 09:11:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 09:11:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 09:13:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 09:13:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 09:19:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 09:19:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 09:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 09:22:27 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 11 09:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 09:23:52 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 09:29:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 09:29:26 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 09:32:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 09:32:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 09:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 09:34:00 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 09:43:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 09:43:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 09:43:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 09:43:38 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 09:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 09:44:07 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 09:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 09:54:17 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 09:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 09:54:17 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 11 09:54:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 09:54:44 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 10:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 10:04:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 10:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 10:04:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 10:07:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 10:07:06 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 10:08:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 18e5029e-fcf3-03cd-9942-b60bc6df07b1 (at 10.9.107.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36a3eeb800, cur 1565543322 expire 1565543172 last 1565543095 Aug 11 10:08:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 10:09:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 10:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 10:14:44 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 10:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 10:14:44 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 10:17:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 10:17:18 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 10:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 10:25:29 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 10:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 10:25:29 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 10:27:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 10:27:35 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 10:36:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 10:36:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 10:36:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 10:36:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 10:38:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 10:38:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 10:46:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 10:46:06 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 10:46:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 10:46:07 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 10:53:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 10:53:14 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 10:56:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 10:56:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 10:56:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 10:56:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 11:03:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 11:03:19 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 11:05:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3fe59a07-9941-fba7-da04-3312f1ded35e (at 10.9.101.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f192847ec00, cur 1565546725 expire 1565546575 last 1565546498 Aug 11 11:05:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 11:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 11:07:19 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 11:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 11:07:19 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 11 11:15:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 11:15:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 11:17:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 11:17:46 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 11:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 11:17:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 11:26:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 11:26:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 11:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 11:27:59 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 11:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 11:27:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 11:36:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 11:36:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 11:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 11:38:39 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 11:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 11:38:39 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 11:46:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 11:46:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 11:48:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 11:48:42 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 11:48:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 11:48:42 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 11:57:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f111b25a-6d2a-16a8-5df8-392d9e810365 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2c5de2e400, cur 1565549830 expire 1565549680 last 1565549603 Aug 11 11:57:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 11:58:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 11:58:58 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 11:58:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 11:58:58 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 11 12:02:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 12:02:40 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 12:09:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 12:09:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 12:09:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 12:09:30 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 12:12:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 12:12:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 12:19:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 12:19:51 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 12:19:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 12:19:51 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 11 12:29:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 12:29:10 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 12:29:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 12:29:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 12:29:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 12:29:56 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 12:39:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 12:39:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 12:40:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 12:40:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 12:42:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 12:42:03 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 12:50:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 12:50:40 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 12:50:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 12:50:40 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 12:52:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 12:52:07 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 13:00:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 13:00:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 13:00:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 13:00:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 13:03:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 13:03:15 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 13:12:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 13:12:00 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 13:14:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 13:14:17 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 13:16:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 13:16:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 13:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 13:22:14 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 13:25:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 13:25:03 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 13:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07cc2193-517e-0739-7ac3-8fdf24fb53fa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2455bd9800, cur 1565555192 expire 1565555042 last 1565554965 Aug 11 13:26:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 13:29:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 13:29:36 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 13:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 13:32:49 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 13:34:55 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d4bf49e00 x1641237401134144/t0(0) o101->6528a72c-7dbf-d506-86e5-e12b1d6e7573@10.8.15.3@o2ib6:0/0 lens 480/568 e 0 to 0 dl 1565555700 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 13:34:55 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Aug 11 13:35:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 13:35:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 13:42:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 13:42:27 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 13:42:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 13:42:54 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 13:45:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 13:45:38 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 13:47:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 498a78c9-7f3d-de09-8574-7071fa200ea6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25955a0400, cur 1565556455 expire 1565556305 last 1565556228 Aug 11 13:47:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 13:53:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 13:53:12 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 13:55:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 13:55:19 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 13:55:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 13:55:47 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 14:03:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 14:03:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 14:05:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:05:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 14:05:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 14:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 14:07:01 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 14:08:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:09:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:12:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7605d08e-7d71-4460-f36e-51e75e52e4a6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff7a7f400, cur 1565557926 expire 1565557776 last 1565557699 Aug 11 14:12:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 14:13:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 14:13:24 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 14:14:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:15:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:16:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 14:16:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 14:17:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 14:17:10 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 14:17:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:23:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 14:23:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 14:27:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 14:27:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 14:27:28 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 14:27:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 14:31:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 14:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 14:33:53 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 11 14:38:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 14:38:03 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 14:38:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 14:38:31 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 14:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 14:44:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 14:49:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 14:49:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 14:49:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 14:49:39 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 14:54:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 14:54:12 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 15:00:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 15:00:22 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 15:01:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 15:01:22 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 15:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 15:05:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 15:05:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 15:11:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 15:11:44 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 15:13:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 15:13:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 15:15:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 15:15:53 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 15:17:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a4af61c4-0134-22da-2b06-8806116f47c7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fec1ee800, cur 1565561832 expire 1565561682 last 1565561605 Aug 11 15:17:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 15:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 15:21:54 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 15:26:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a4760b0-b898-33aa-fe75-192d36e634fa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16cc0f0400, cur 1565562406 expire 1565562256 last 1565562179 Aug 11 15:26:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 15:27:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 15:27:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 15:31:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 15:31:59 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 15:34:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 15:34:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 15:36:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 15:36:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 15:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 15:37:12 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 15:42:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 15:42:15 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 15:44:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 15:44:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 15:45:25 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ebee728d-e562-a260-e188-9a4ecf892dde (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4535022c00, cur 1565563525 expire 1565563375 last 1565563298 Aug 11 15:45:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 15:47:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 15:47:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 15:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 15:54:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 16:00:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:00:17 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 16:00:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 16:05:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 16:05:33 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 16:07:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c780262f-3cc8-3a64-7687-57b03b0275b7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e1e769800, cur 1565564849 expire 1565564699 last 1565564622 Aug 11 16:07:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 16:09:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 16:12:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:12:50 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 16:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 673a9294-d573-1a0a-c6cc-4dea7d944ff9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26e9446400, cur 1565565221 expire 1565565071 last 1565564994 Aug 11 16:13:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 16:16:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 16:16:31 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 16:18:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c850932-a95c-b29e-f97e-a03cf60b1b18 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d00d55000, cur 1565565528 expire 1565565378 last 1565565301 Aug 11 16:18:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 16:21:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 16:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:24:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 16:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 16:26:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 16:28:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 16:34:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:34:29 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 16:37:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 16:37:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 16:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 514ee0e5-1322-836a-446c-377c3cf0b7de (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32e9e6ac00, cur 1565566667 expire 1565566517 last 1565566440 Aug 11 16:37:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 16:37:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 514ee0e5-1322-836a-446c-377c3cf0b7de (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33bcb08800, cur 1565566669 expire 1565566519 last 1565566442 Aug 11 16:37:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 16:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:45:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 16:45:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e95526ba-ef09-5a54-0126-2f764f5643e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4075da2400, cur 1565567119 expire 1565566969 last 1565566892 Aug 11 16:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e95526ba-ef09-5a54-0126-2f764f5643e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39cc355c00, cur 1565567121 expire 1565566971 last 1565566894 Aug 11 16:52:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 16:52:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 16:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 16:55:21 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 17:03:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 17:03:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 17:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 17:06:04 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 17:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 78b49bd2-37f4-7ca8-414e-67b16d56741f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f191854e000, cur 1565568834 expire 1565568684 last 1565568607 Aug 11 17:13:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 17:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 17:14:02 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 17:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 78b49bd2-37f4-7ca8-414e-67b16d56741f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f191854e800, cur 1565568848 expire 1565568698 last 1565568621 Aug 11 17:14:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 17:19:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 17:19:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 17:19:22 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 17:24:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 17:24:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 17:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 17:29:41 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 17:36:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 11 17:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 30984eda-e161-71f7-ae0b-baf90484d411 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e88f32c00, cur 1565570233 expire 1565570083 last 1565570006 Aug 11 17:38:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75957e97-7383-1e3d-e135-676699984c8e (at 10.9.114.12@o2ib4) in 194 seconds. I think it's dead, and I am evicting it. exp ffff8f19e139a400, cur 1565570309 expire 1565570159 last 1565570115 Aug 11 17:38:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 17:39:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75957e97-7383-1e3d-e135-676699984c8e (at 10.9.114.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16b7b7b400, cur 1565570342 expire 1565570192 last 1565570115 Aug 11 17:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 17:42:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 17:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 17:42:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 17:43:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1a04b447-5ce2-7243-1f7f-305fcef982f9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b0c317c00, cur 1565570593 expire 1565570443 last 1565570366 Aug 11 17:43:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 17:48:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 30ccbe99-c1f6-a853-7ba1-833b43a32fd5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2267962800, cur 1565570899 expire 1565570749 last 1565570672 Aug 11 17:48:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 17:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 30ccbe99-c1f6-a853-7ba1-833b43a32fd5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f253021d800, cur 1565570910 expire 1565570760 last 1565570683 Aug 11 17:48:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 17:54:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 17:54:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 17:54:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 17:54:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 18:00:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 18:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 18:05:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 18:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 18:05:13 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 18:05:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8c6e5d68-66e5-6464-5f97-0702932e386a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2859067800, cur 1565571947 expire 1565571797 last 1565571720 Aug 11 18:12:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 18:13:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 18:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 18:16:22 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 18:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 18:16:22 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 18:18:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 18:20:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:23:14 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565572987/real 1565572987] req@ffff8f3149d85a00 x1636761833719856/t0(0) o104->fir-MDT0000@10.9.109.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565572994 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 18:23:14 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 55 previous similar messages Aug 11 18:23:24 fir-md1-s1 kernel: Lustre: 97672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ba30cd100 x1631549832569632/t0(0) o101->362621d0-7ac3-9c5b-280e-e0d76da4f0b2@10.9.106.66@o2ib4:29/0 lens 584/3264 e 1 to 0 dl 1565573009 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 18:23:32 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f32ea0c9b00 x1638092132750592/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:7/0 lens 1856/3288 e 0 to 0 dl 1565573017 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 18:23:32 fir-md1-s1 kernel: Lustre: 20465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 11 18:23:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 18:23:51 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f23cc791800 x1631568799357184/t0(0) o101->eafaef03-bf23-6214-eeef-c768f6a5fb7d@10.9.105.58@o2ib4:26/0 lens 584/3264 e 0 to 0 dl 1565573036 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 18:23:51 fir-md1-s1 kernel: Lustre: 21428:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Aug 11 18:23:56 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565573029/real 1565573029] req@ffff8f3149d85a00 x1636761833719856/t0(0) o104->fir-MDT0000@10.9.109.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565573036 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 18:23:56 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 11 18:24:25 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f09ed421200 x1631596594991808/t0(0) o101->4dbe6048-7f70-8f0f-700e-3b78f70d5297@10.9.108.12@o2ib4:0/0 lens 584/3264 e 0 to 0 dl 1565573070 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 18:24:25 fir-md1-s1 kernel: Lustre: 20738:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Aug 11 18:24:38 fir-md1-s1 kernel: LustreError: 23726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565572988, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3ed0cade80/0x5d9ee6c4f25d43a6 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 64 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23726 timeout: 0 lvb_type: 0 Aug 11 18:24:38 fir-md1-s1 kernel: LustreError: 23726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 11 18:24:44 fir-md1-s1 kernel: LustreError: 23728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565572994, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2f3980b3c0/0x5d9ee6c4f26bb47f lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 66 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23728 timeout: 0 lvb_type: 0 Aug 11 18:24:44 fir-md1-s1 kernel: LustreError: 23728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 11 18:24:57 fir-md1-s1 kernel: LustreError: 21455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565573006, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f17f49269c0/0x5d9ee6c4f2839301 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 68 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21455 timeout: 0 lvb_type: 0 Aug 11 18:24:57 fir-md1-s1 kernel: LustreError: 21455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 11 18:25:13 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565573106/real 1565573106] req@ffff8f3149d85a00 x1636761833719856/t0(0) o104->fir-MDT0000@10.9.109.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565573113 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 18:25:13 fir-md1-s1 kernel: Lustre: 23700:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Aug 11 18:25:15 fir-md1-s1 kernel: LustreError: 10150:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565573025, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1e3509c800/0x5d9ee6c4f295629a lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 69 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10150 timeout: 0 lvb_type: 0 Aug 11 18:25:15 fir-md1-s1 kernel: LustreError: 10150:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Aug 11 18:25:41 fir-md1-s1 kernel: LustreError: 23700:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.109.49@o2ib4) failed to reply to blocking AST (req@ffff8f3149d85a00 x1636761833719856 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f4021ae4800/0x5d9ee6c4f24531bc lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 68 type: IBT flags: 0x60200400000020 nid: 10.9.109.49@o2ib4 remote: 0xf0e6308402b67886 expref: 1011 pid: 23630 timeout: 4688343 lvb_type: 0 Aug 11 18:25:41 fir-md1-s1 kernel: LustreError: 23700:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 55 previous similar messages Aug 11 18:25:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.109.49@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 11 18:25:41 fir-md1-s1 kernel: LustreError: Skipped 55 previous similar messages Aug 11 18:25:41 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.109.49@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f4021ae4800/0x5d9ee6c4f24531bc lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 68 type: IBT flags: 0x60200400000020 nid: 10.9.109.49@o2ib4 remote: 0xf0e6308402b67886 expref: 1012 pid: 23630 timeout: 0 lvb_type: 0 Aug 11 18:25:42 fir-md1-s1 kernel: Lustre: 23700:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1s); client may timeout. req@ffff8f32ea0c9b00 x1638092132750592/t443797069775(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:7/0 lens 1856/1192 e 0 to 0 dl 1565573141 ref 1 fl Complete:/0/0 rc 0/0 Aug 11 18:25:42 fir-md1-s1 kernel: Lustre: 23700:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 53 previous similar messages Aug 11 18:25:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4af14e89-fa61-6fc9-8367-801335442192 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2adc3ed400, cur 1565573149 expire 1565572999 last 1565572922 Aug 11 18:25:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 18:27:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.29.6@o2ib6, removing former export from same NID Aug 11 18:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client baaf9aa6-d6ac-d219-ff91-f47dd67dd412 (at 10.8.29.6@o2ib6) reconnecting Aug 11 18:27:10 fir-md1-s1 kernel: Lustre: Skipped 189 previous similar messages Aug 11 18:27:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0af4f40a-317e-88ce-7d9c-c4839b78e5a4 (at 10.8.29.6@o2ib6) Aug 11 18:27:10 fir-md1-s1 kernel: Lustre: Skipped 191 previous similar messages Aug 11 18:27:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.29.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:28:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.29.6@o2ib6, removing former export from same NID Aug 11 18:38:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 18:38:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 18:38:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 18:38:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 18:38:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 49d730f2-c41e-9aa4-78ae-46011d374d9e (at 10.9.114.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f08e9044400, cur 1565573910 expire 1565573760 last 1565573683 Aug 11 18:38:30 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 18:43:32 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 18:43:32 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 50 previous similar messages Aug 11 18:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 18:50:13 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 18:50:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 18:50:13 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 18:50:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 18:51:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:01:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 19:01:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 19:01:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 19:01:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 19:05:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 19:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 19:11:21 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 19:11:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 19:11:21 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 19:17:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 19:21:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 19:21:29 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 19:21:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 19:21:29 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 19:22:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 19:27:40 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:27:40 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 23 previous similar messages Aug 11 19:27:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:27:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 7 previous similar messages Aug 11 19:27:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.28.2@o2ib6, removing former export from same NID Aug 11 19:27:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:27:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 11 19:27:59 fir-md1-s1 kernel: Lustre: 20732:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565576872/real 0] req@ffff8f341e615a00 x1636761862752160/t0(0) o104->fir-MDT0000@10.8.26.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565576879 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:27:59 fir-md1-s1 kernel: Lustre: 20732:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 11 19:27:59 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f19659f0050 x1634625489247536/t0(0) o4->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:11/0 lens 488/448 e 1 to 0 dl 1565576891 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 19:27:59 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 12 previous similar messages Aug 11 19:28:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.29.1@o2ib6, removing former export from same NID Aug 11 19:28:00 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c2f817000 Aug 11 19:28:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 46725c7e-13ed-427c-fac8-b2b98cb851a6 (at 10.8.17.12@o2ib6), client will retry: rc = -110 Aug 11 19:28:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f4260853600 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2089d9d000 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b8ba53c00 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3505344e00 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2988baba00 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c2f815400 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2699456200 Aug 11 19:28:00 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32b6f6a800 Aug 11 19:28:01 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2502fd5c00 Aug 11 19:28:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 07168079-701a-2bc9-f830-980c9c4453ed (at 10.8.21.4@o2ib6), client will retry: rc -110 Aug 11 19:28:01 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Aug 11 19:28:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.22@o2ib6, removing former export from same NID Aug 11 19:28:04 fir-md1-s1 kernel: Lustre: Skipped 150 previous similar messages Aug 11 19:28:08 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e95a0aa00 Aug 11 19:28:08 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2501aed400 Aug 11 19:28:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.8.9@o2ib6, removing former export from same NID Aug 11 19:28:12 fir-md1-s1 kernel: Lustre: Skipped 218 previous similar messages Aug 11 19:28:13 fir-md1-s1 kernel: Lustre: 21040:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2959d35450 x1631587193130480/t0(0) o3->a2c44fb9-486a-447c-ab16-c5c889d1e2f3@10.8.27.3@o2ib6:18/0 lens 488/440 e 1 to 0 dl 1565576898 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:13 fir-md1-s1 kernel: Lustre: 21040:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Aug 11 19:28:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:28:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 16 previous similar messages Aug 11 19:28:15 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f40e8fcb400 Aug 11 19:28:16 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d51b1be00 Aug 11 19:28:18 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19c9292e00 Aug 11 19:28:21 fir-md1-s1 kernel: Lustre: 21460:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565576894/real 0] req@ffff8f1804996600 x1636761862958512/t0(0) o104->fir-MDT0000@10.8.7.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565576901 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:28:21 fir-md1-s1 kernel: Lustre: 21460:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages Aug 11 19:28:22 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f21c2f850 x1631918495650656/t0(0) o3->cd075b36-33db-5052-abc8-0d1d7f478890@10.8.30.8@o2ib6:27/0 lens 488/440 e 1 to 0 dl 1565576907 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:22 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Aug 11 19:28:23 fir-md1-s1 kernel: Lustre: 20732:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f28953ae900 x1641512157440288/t443799006686(0) o36->4cf7a050-aa92-e42e-d5ec-7378bc570efd@10.8.25.10@o2ib6:22/0 lens 504/424 e 0 to 0 dl 1565576902 ref 1 fl Complete:/0/0 rc 0/0 Aug 11 19:28:26 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f361b4ca200 Aug 11 19:28:27 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19bfa1d800 Aug 11 19:28:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.29@o2ib6, removing former export from same NID Aug 11 19:28:28 fir-md1-s1 kernel: Lustre: Skipped 145 previous similar messages Aug 11 19:28:29 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f453b223400 Aug 11 19:28:29 fir-md1-s1 kernel: Lustre: 24565:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f2fe0cfd850 x1641541385997504/t0(0) o3->d445ca69-c296-26cf-8b68-836eee5dfcec@10.8.30.2@o2ib6:25/0 lens 488/440 e 1 to 0 dl 1565576905 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 19:28:29 fir-md1-s1 kernel: Lustre: 24565:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 11 19:28:30 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2195d80000 Aug 11 19:28:33 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3d3037d200 Aug 11 19:28:38 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3758282a00 Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 1f6dcbe1-0bdc-a36f-a698-e7085eab26b7 (at 10.8.11.7@o2ib6), client will retry: rc -110 Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: 46551:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff8f2e227b5850 x1632260152579344/t0(0) o3->1f6dcbe1-0bdc-a36f-a698-e7085eab26b7@10.8.11.7@o2ib6:0/0 lens 488/440 e 1 to 0 dl 1565576910 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: 46551:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: 21322:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f39037e8900 x1631586900858640/t0(0) o101->f070aa79-4085-01c4-e45c-5c90a853bda7@10.9.106.25@o2ib4:13/0 lens 576/3264 e 0 to 0 dl 1565576923 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:38 fir-md1-s1 kernel: Lustre: 21322:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Aug 11 19:28:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f39e08fdc00 Aug 11 19:28:42 fir-md1-s1 kernel: LustreError: 49475:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2b23b57450 x1635363475716752/t0(0) o3->e90e26e9-54e6-7601-c634-05b1cc133462@10.8.18.18@o2ib6:12/0 lens 488/440 e 0 to 0 dl 1565576922 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:42 fir-md1-s1 kernel: LustreError: 49475:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 17 previous similar messages Aug 11 19:28:43 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b35645e00 Aug 11 19:28:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.7.11@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1716712d00/0x5d9ee6c50534bfd1 lrc: 4/0,0 mode: PR/PR res: [0x200029e09:0x3a:0x0].0x0 bits 0x13/0x0 rrc: 88 type: IBT flags: 0x60200400000020 nid: 10.8.7.11@o2ib6 remote: 0xd681fd3bbf2a78da expref: 1187 pid: 97646 timeout: 4691983 lvb_type: 0 Aug 11 19:28:48 fir-md1-s1 kernel: LustreError: 21042:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2959d36850 x1631585475639392/t0(0) o3->8776df39-692a-3df9-0874-72e441440742@10.8.18.22@o2ib6:23/0 lens 488/440 e 0 to 0 dl 1565576933 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:48 fir-md1-s1 kernel: LustreError: 21042:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 17 previous similar messages Aug 11 19:28:49 fir-md1-s1 kernel: LustreError: 25086:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.7.11@o2ib6 arrived at 1565576929 with bad export cookie 6746082289101629079 Aug 11 19:28:51 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:28:51 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 10 previous similar messages Aug 11 19:28:51 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20e1b06000 Aug 11 19:28:52 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f25313ef050 x1640004714643632/t0(0) o3->c6ad836b-033a-3280-2441-b2ee1433cb42@10.8.18.17@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1565576932 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 19:28:52 fir-md1-s1 kernel: LustreError: 21449:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Aug 11 19:28:57 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 52s: evicting client at 10.8.25.12@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f24fe5986c0/0x5d9ee6c505430e8a lrc: 4/0,0 mode: PR/PR res: [0x20002a02f:0x100e:0x0].0x0 bits 0x1b/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.8.25.12@o2ib6 remote: 0xc3daedb1e54c782a expref: 4209 pid: 97660 timeout: 4691997 lvb_type: 0 Aug 11 19:28:57 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 19:28:58 fir-md1-s1 kernel: LustreError: 31011:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.7.11@o2ib6 arrived at 1565576938 with bad export cookie 6746082289101629079 Aug 11 19:29:00 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565576933/real 0] req@ffff8f21b4c88600 x1636761863314640/t0(0) o104->fir-MDT0000@10.8.30.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565576940 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:29:00 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 11 19:29:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.13.29@o2ib6, removing former export from same NID Aug 11 19:29:00 fir-md1-s1 kernel: Lustre: Skipped 295 previous similar messages Aug 11 19:29:07 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22d86a1800 Aug 11 19:29:07 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:15s); client may timeout. req@ffff8f25313ef050 x1640004714643632/t0(0) o3->c6ad836b-033a-3280-2441-b2ee1433cb42@10.8.18.17@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1565576932 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 19:29:07 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Aug 11 19:29:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 57s: evicting client at 10.8.26.28@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f32cb2f7740/0x5d9ee6c5054b1779 lrc: 4/0,0 mode: PR/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.8.26.28@o2ib6 remote: 0xff0b1f607b1120a1 expref: 3155 pid: 97646 timeout: 4692007 lvb_type: 0 Aug 11 19:29:13 fir-md1-s1 kernel: Lustre: 27583:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d06734050 x1631683071765216/t0(0) o3->4922856f-fe57-7196-28b1-c0fb66220ebe@10.8.21.7@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1565576958 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:29:13 fir-md1-s1 kernel: Lustre: 27583:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Aug 11 19:29:18 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1d06734050 x1631683071765216/t0(0) o3->4922856f-fe57-7196-28b1-c0fb66220ebe@10.8.21.7@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1565576958 ref 1 fl Interpret:/0/0 rc 0/0 Aug 11 19:29:18 fir-md1-s1 kernel: LustreError: 46581:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 5 previous similar messages Aug 11 19:29:19 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22977b5600 Aug 11 19:29:22 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3e4fc3a400 Aug 11 19:29:22 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0cbc9b8e00 Aug 11 19:29:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.30.35@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1e4779f980/0x5d9ee6c504e5720a lrc: 4/0,0 mode: PR/PR res: [0x200029ae3:0x5e:0x0].0x0 bits 0x13/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.30.35@o2ib6 remote: 0xb789b8dfe49782c5 expref: 496 pid: 21455 timeout: 4692022 lvb_type: 0 Aug 11 19:29:22 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 19:29:24 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2fa3f22800 Aug 11 19:29:40 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f318c2f3e00 Aug 11 19:29:40 fir-md1-s1 kernel: Lustre: 24568:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:44s); client may timeout. req@ffff8f2e941c3050 x1634520908111760/t0(0) o3->7a7a90f2-46dd-49dc-cc68-9ea5ca5dbef1@10.8.13.5@o2ib6:26/0 lens 488/440 e 1 to 0 dl 1565576936 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 19:29:40 fir-md1-s1 kernel: Lustre: 24568:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Aug 11 19:29:40 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2d20809200/0x5d9ee6c5053b75a2 lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 724 type: IBT flags: 0x60200400000020 nid: 10.8.27.33@o2ib6 remote: 0xb9bfdb606218279c expref: 257 pid: 23581 timeout: 4692040 lvb_type: 0 Aug 11 19:29:41 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17c70ed000 Aug 11 19:29:41 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1e99680600 Aug 11 19:29:47 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565576897, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3483a0ec00/0x5d9ee6c5055796b8 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 34 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23597 timeout: 0 lvb_type: 0 Aug 11 19:29:47 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 26 previous similar messages Aug 11 19:29:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f23e06eca40/0x5d9ee6c5055f0346 lrc: 4/0,0 mode: PR/PR res: [0x2c002c57f:0x18741:0x0].0x0 bits 0x13/0x0 rrc: 68 type: IBT flags: 0x60200400000020 nid: 10.8.27.11@o2ib6 remote: 0x1e0a9394e0bac20d expref: 128 pid: 23738 timeout: 4692059 lvb_type: 0 Aug 11 19:29:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Aug 11 19:30:01 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:30:01 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Aug 11 19:30:01 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a80597e00 Aug 11 19:30:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with f18f5844-4ec0-3cde-21e1-0f1a02440d5a (at 10.8.17.1@o2ib6), client will retry: rc -110 Aug 11 19:30:01 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 19:30:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.2.23@o2ib6, removing former export from same NID Aug 11 19:30:04 fir-md1-s1 kernel: Lustre: Skipped 867 previous similar messages Aug 11 19:30:08 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f376623fc00 Aug 11 19:30:11 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f242799ee00 Aug 11 19:30:13 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2590872800 Aug 11 19:30:22 fir-md1-s1 kernel: Lustre: 23748:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26cc4a0f00 x1631549084749056/t0(0) o101->214bcacf-deef-8b1a-7220-98313adef1de@10.9.102.36@o2ib4:27/0 lens 584/3264 e 0 to 0 dl 1565577027 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:30:22 fir-md1-s1 kernel: Lustre: 23748:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 112 previous similar messages Aug 11 19:30:28 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565577021/real 0] req@ffff8f3eee954e00 x1636761863819040/t0(0) o104->fir-MDT0000@10.8.2.28@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565577028 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:30:28 fir-md1-s1 kernel: Lustre: 21667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Aug 11 19:30:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a26f16000 Aug 11 19:30:32 fir-md1-s1 kernel: LNetError: 31007:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.15.3@o2ib6 from 10.0.10.51@o2ib7 Aug 11 19:30:32 fir-md1-s1 kernel: LNetError: 31007:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 7 previous similar messages Aug 11 19:30:41 fir-md1-s1 kernel: LustreError: 23722:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565576951, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0b27372880/0x5d9ee6c505701178 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 761 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23722 timeout: 0 lvb_type: 0 Aug 11 19:30:41 fir-md1-s1 kernel: LustreError: 23722:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 11 19:30:50 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565576960, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3245e35580/0x5d9ee6c5057392f2 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21003 timeout: 0 lvb_type: 0 Aug 11 19:30:50 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 92 previous similar messages Aug 11 19:31:02 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f13d7186600 Aug 11 19:31:02 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:82s); client may timeout. req@ffff8f1d06736c50 x1631683071904352/t0(0) o3->4922856f-fe57-7196-28b1-c0fb66220ebe@10.8.21.7@o2ib6:10/0 lens 488/440 e 0 to 0 dl 1565576980 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 11 19:31:02 fir-md1-s1 kernel: Lustre: 21449:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Aug 11 19:31:16 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f26cb7fe400 Aug 11 19:31:22 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3a06ea7e00 Aug 11 19:31:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2683ad34-c98d-003f-7a4a-f3d0c48493e3 (at 10.8.25.12@o2ib6) Aug 11 19:31:29 fir-md1-s1 kernel: Lustre: Skipped 6370 previous similar messages Aug 11 19:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6) reconnecting Aug 11 19:31:29 fir-md1-s1 kernel: Lustre: Skipped 3685 previous similar messages Aug 11 19:31:31 fir-md1-s1 kernel: LNet: Service thread pid 23678 was inactive for 200.75s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:31:31 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 11 19:31:31 fir-md1-s1 kernel: Pid: 23678, comm: mdt02_071 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:31:31 fir-md1-s1 kernel: Call Trace: Aug 11 19:31:31 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 11 19:31:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:31:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:31:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:31:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:31:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577091.23678 Aug 11 19:31:32 fir-md1-s1 kernel: LustreError: 23738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577002, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2bef68c140/0x5d9ee6c505817e3f lrc: 3/1,0 mode: --/PR res: [0x2c002c57f:0x18741:0x0].0x0 bits 0x13/0x0 rrc: 69 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23738 timeout: 0 lvb_type: 0 Aug 11 19:31:32 fir-md1-s1 kernel: LustreError: 23738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 11 19:31:37 fir-md1-s1 kernel: LNet: Service thread pid 23597 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:31:37 fir-md1-s1 kernel: Pid: 23597, comm: mdt02_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:31:37 fir-md1-s1 kernel: Call Trace: Aug 11 19:31:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:31:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:31:37 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:31:37 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:31:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:31:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:31:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:31:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:31:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:31:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577097.23597 Aug 11 19:31:48 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f31a165f600 Aug 11 19:32:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.29@o2ib6, removing former export from same NID Aug 11 19:32:13 fir-md1-s1 kernel: Lustre: Skipped 1456 previous similar messages Aug 11 19:32:14 fir-md1-s1 kernel: LNet: Service thread pid 21667 was inactive for 200.63s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:32:14 fir-md1-s1 kernel: Pid: 21667, comm: mdt03_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:32:14 fir-md1-s1 kernel: Call Trace: Aug 11 19:32:14 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 11 19:32:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:32:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:32:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:32:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:32:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577134.21667 Aug 11 19:32:18 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:32:18 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 10 previous similar messages Aug 11 19:32:34 fir-md1-s1 kernel: LustreError: 23584:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577064, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2684a860c0/0x5d9ee6c505922b51 lrc: 3/0,1 mode: --/CW res: [0x200029791:0x7f50:0x0].0x0 bits 0x2/0x0 rrc: 728 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23584 timeout: 0 lvb_type: 0 Aug 11 19:32:34 fir-md1-s1 kernel: LustreError: 23671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577064, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2e4424b840/0x5d9ee6c505922c85 lrc: 3/0,1 mode: --/CW res: [0x200029791:0x7f50:0x0].0x0 bits 0x2/0x0 rrc: 724 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23671 timeout: 0 lvb_type: 0 Aug 11 19:32:34 fir-md1-s1 kernel: LustreError: 23671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 11 19:32:34 fir-md1-s1 kernel: LustreError: 23584:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 83 previous similar messages Aug 11 19:32:37 fir-md1-s1 kernel: Lustre: 20466:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f39037ea700 x1631587050499680/t0(0) o101->409782ab-594c-0837-10bd-459bd6e52b7f@10.9.106.26@o2ib4:12/0 lens 584/3264 e 0 to 0 dl 1565577162 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:32:37 fir-md1-s1 kernel: Lustre: 20466:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 130 previous similar messages Aug 11 19:32:40 fir-md1-s1 kernel: LNet: Service thread pid 21003 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:32:40 fir-md1-s1 kernel: Pid: 21003, comm: mdt02_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:32:40 fir-md1-s1 kernel: Call Trace: Aug 11 19:32:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:32:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:32:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:32:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:32:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:32:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:32:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:32:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:32:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:32:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577160.21003 Aug 11 19:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e9c5a421-c400-967f-fe3d-134ef9bd0037 (at 10.8.7.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3ebb3ad000, cur 1565577170 expire 1565577020 last 1565576943 Aug 11 19:32:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 19:32:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 106s: evicting client at 10.8.28.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f17c50da640/0x5d9ee6c505705270 lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 727 type: IBT flags: 0x60200400000020 nid: 10.8.28.8@o2ib6 remote: 0x7180114cdac0b60f expref: 961 pid: 10146 timeout: 4692153 lvb_type: 0 Aug 11 19:32:50 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Aug 11 19:33:04 fir-md1-s1 kernel: LNet: Service thread pid 23597 completed after 287.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:33:04 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 11 19:33:20 fir-md1-s1 kernel: LNetError: 30993:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.15.3@o2ib6 from 10.0.10.51@o2ib7 Aug 11 19:33:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2af3b0c8-7f33-1b36-e691-19b3060df1cb (at 10.8.7.22@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff8f34eb254400, cur 1565577210 expire 1565577060 last 1565577039 Aug 11 19:33:30 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 19:33:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 44s: evicting client at 10.8.28.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f26f7cef740/0x5d9ee6c4d966e253 lrc: 4/0,0 mode: PR/PR res: [0x2c002c2b0:0x127e2:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.28.4@o2ib6 remote: 0x9fc2c3f0f4d6600d expref: 187 pid: 23746 timeout: 4692258 lvb_type: 0 Aug 11 19:33:33 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 19:33:42 fir-md1-s1 kernel: LustreError: 10150:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577132, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f44876f4a40/0x5d9ee6c505a29614 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f50:0x0].0x0 bits 0x13/0x0 rrc: 790 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 10150 timeout: 0 lvb_type: 0 Aug 11 19:33:42 fir-md1-s1 kernel: LustreError: 10150:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 11 19:33:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d8428b3f-ceef-fb57-6c0a-b3ad15aaf988 (at 10.8.27.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18656b6000, cur 1565577227 expire 1565577077 last 1565577000 Aug 11 19:33:47 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 19:34:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec9719ae-e98d-245f-cb43-8c61dda19eb4 (at 10.8.18.29@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f258467ec00, cur 1565577246 expire 1565577096 last 1565577022 Aug 11 19:34:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 19:34:13 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.6@o2ib6 arrived at 1565577253 with bad export cookie 6746082742713466828 Aug 11 19:34:13 fir-md1-s1 kernel: LustreError: 25029:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 4 previous similar messages Aug 11 19:34:25 fir-md1-s1 kernel: LNet: Service thread pid 23747 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:34:25 fir-md1-s1 kernel: Pid: 23747, comm: mdt02_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:34:25 fir-md1-s1 kernel: Call Trace: Aug 11 19:34:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:34:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:34:25 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:34:25 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:34:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:34:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:34:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:34:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:34:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:34:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577265.23747 Aug 11 19:34:25 fir-md1-s1 kernel: LNet: Service thread pid 20738 was inactive for 200.68s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:34:26 fir-md1-s1 kernel: Lustre: 23727:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565577259/real 0] req@ffff8f3c7c75e300 x1636761865015696/t0(0) o104->fir-MDT0000@10.8.0.67@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565577266 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:34:26 fir-md1-s1 kernel: Lustre: 23727:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Aug 11 19:34:35 fir-md1-s1 kernel: LNet: Service thread pid 23638 was inactive for 200.39s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:34:35 fir-md1-s1 kernel: LNet: Skipped 33 previous similar messages Aug 11 19:34:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577275.23638 Aug 11 19:34:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.67@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1d54fb8480/0x5d9ee6c5055c78a8 lrc: 4/0,0 mode: PR/PR res: [0x200029bfc:0xd9f7:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.67@o2ib6 remote: 0xb7325a020ca47ec2 expref: 456 pid: 26253 timeout: 4692348 lvb_type: 0 Aug 11 19:34:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Aug 11 19:34:57 fir-md1-s1 kernel: LNet: Service thread pid 21667 completed after 363.78s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:35:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1d0afa42-1868-88a7-3a68-fe8e7630a651 (at 10.8.30.30@o2ib6) in 167 seconds. I think it's dead, and I am evicting it. exp ffff8f2e4422ac00, cur 1565577303 expire 1565577153 last 1565577136 Aug 11 19:35:03 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 19:35:16 fir-md1-s1 kernel: LNet: Service thread pid 23751 was inactive for 200.55s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:35:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577316.23751 Aug 11 19:35:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f0a8fbb7-06c4-ed16-a94f-6cea310ceb29 (at 10.8.0.82@o2ib6) in 196 seconds. I think it's dead, and I am evicting it. exp ffff8f1fa698b400, cur 1565577322 expire 1565577172 last 1565577126 Aug 11 19:35:22 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 11 19:35:32 fir-md1-s1 kernel: LNet: Service thread pid 10150 was inactive for 200.14s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:35:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577332.10150 Aug 11 19:35:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577333.23667 Aug 11 19:35:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577339.21372 Aug 11 19:36:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577364.21452 Aug 11 19:36:05 fir-md1-s1 kernel: LNet: Service thread pid 24578 was inactive for 200.28s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:36:05 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 11 19:36:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577365.24578 Aug 11 19:36:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577366.10559 Aug 11 19:36:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577369.50581 Aug 11 19:36:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577370.21311 Aug 11 19:36:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 85ad41d9-b5d7-9f65-161e-448c1b0b3f98 (at 10.8.1.25@o2ib6) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8f1489f50800, cur 1565577382 expire 1565577232 last 1565577157 Aug 11 19:36:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577382.23580 Aug 11 19:36:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.2@o2ib6, removing former export from same NID Aug 11 19:36:29 fir-md1-s1 kernel: Lustre: Skipped 1525 previous similar messages Aug 11 19:36:34 fir-md1-s1 kernel: LNet: Service thread pid 50582 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:36:34 fir-md1-s1 kernel: Pid: 50582, comm: mdt02_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:36:34 fir-md1-s1 kernel: Call Trace: Aug 11 19:36:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:36:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:36:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:36:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:36:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:36:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:36:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:36:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:36:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:36:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577394.50582 Aug 11 19:36:34 fir-md1-s1 kernel: LustreError: 23726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577304, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0ec12b18c0/0x5d9ee6c505e8f582 lrc: 3/1,0 mode: --/PR res: [0x2c002be03:0x5efd:0x0].0x0 bits 0x13/0x0 rrc: 38 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23726 timeout: 0 lvb_type: 0 Aug 11 19:36:34 fir-md1-s1 kernel: LustreError: 23726:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 99 previous similar messages Aug 11 19:36:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:36:54 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f24f69ecb00 x1634129187468176/t0(0) o101->437db638-1a8f-d9e7-3d4a-b386602e77f0@10.9.102.35@o2ib4:29/0 lens 576/3264 e 0 to 0 dl 1565577419 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:36:54 fir-md1-s1 kernel: Lustre: 97647:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 137 previous similar messages Aug 11 19:36:54 fir-md1-s1 kernel: LNet: Service thread pid 23649 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:36:54 fir-md1-s1 kernel: Pid: 23649, comm: mdt00_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:36:54 fir-md1-s1 kernel: Call Trace: Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:36:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:36:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:36:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577414.23649 Aug 11 19:36:54 fir-md1-s1 kernel: Pid: 23687, comm: mdt00_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:36:54 fir-md1-s1 kernel: Call Trace: Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:36:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:36:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:36:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:36:54 fir-md1-s1 kernel: Pid: 25680, comm: mdt02_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:36:54 fir-md1-s1 kernel: Call Trace: Aug 11 19:36:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:36:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:36:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:36:55 fir-md1-s1 kernel: Pid: 21678, comm: mdt03_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:36:55 fir-md1-s1 kernel: Call Trace: Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:36:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:36:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:36:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:37:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577421.10195 Aug 11 19:37:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:37:12 fir-md1-s1 kernel: LNet: Service thread pid 27319 completed after 367.73s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:37:12 fir-md1-s1 kernel: LNet: Skipped 71 previous similar messages Aug 11 19:37:39 fir-md1-s1 kernel: LNet: Service thread pid 23727 was inactive for 200.42s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:37:39 fir-md1-s1 kernel: LNet: Skipped 108 previous similar messages Aug 11 19:37:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577459.23727 Aug 11 19:37:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2af3b0c8-7f33-1b36-e691-19b3060df1cb (at 10.8.7.22@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f2452d14800, cur 1565577465 expire 1565577315 last 1565577241 Aug 11 19:37:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 19:37:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.31.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f340e367980/0x5d9ee6c5049338f5 lrc: 4/0,0 mode: PR/PR res: [0x200029866:0x16947:0x0].0x0 bits 0x13/0x0 rrc: 43 type: IBT flags: 0x60200400000020 nid: 10.8.31.4@o2ib6 remote: 0xe81f6dd57dbd22a1 expref: 413 pid: 23685 timeout: 4692525 lvb_type: 0 Aug 11 19:37:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Aug 11 19:38:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577504.23726 Aug 11 19:38:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577505.21430 Aug 11 19:38:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577507.23676 Aug 11 19:38:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577508.23745 Aug 11 19:38:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577511.23569 Aug 11 19:38:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577517.23688 Aug 11 19:38:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577520.21667 Aug 11 19:38:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577521.23622 Aug 11 19:38:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577522.26257 Aug 11 19:38:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577526.21380 Aug 11 19:38:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577532.22287 Aug 11 19:38:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577535.21676 Aug 11 19:38:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577537.21371 Aug 11 19:39:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f2f779cf-d459-667d-6b56-c14a76db50bb (at 10.8.27.23@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff8f237ebdd800, cur 1565577541 expire 1565577391 last 1565577318 Aug 11 19:39:01 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 19:39:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577545.23617 Aug 11 19:39:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:40:20 fir-md1-s1 kernel: LNet: Service thread pid 10362 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:40:20 fir-md1-s1 kernel: LNet: Skipped 15 previous similar messages Aug 11 19:40:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577620.10362 Aug 11 19:40:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577636.23736 Aug 11 19:40:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577637.23641 Aug 11 19:40:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577638.21460 Aug 11 19:40:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577639.23619 Aug 11 19:40:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577640.21411 Aug 11 19:40:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577641.23639 Aug 11 19:40:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577642.97648 Aug 11 19:40:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577643.21461 Aug 11 19:40:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577644.50582 Aug 11 19:40:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577645.10333 Aug 11 19:40:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577647.23615 Aug 11 19:40:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577659.20465 Aug 11 19:41:01 fir-md1-s1 kernel: LustreError: 23685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565577571, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2a3f233f00/0x5d9ee6c5063897f8 lrc: 3/1,0 mode: --/PR res: [0x2c002c2b0:0x127e2:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23685 timeout: 0 lvb_type: 0 Aug 11 19:41:01 fir-md1-s1 kernel: LustreError: 23685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 116 previous similar messages Aug 11 19:41:22 fir-md1-s1 kernel: Lustre: 20731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565577675/real 0] req@ffff8f0d75b81b00 x1636761867357920/t0(0) o104->fir-MDT0000@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565577682 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:41:22 fir-md1-s1 kernel: Lustre: 20731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 47 previous similar messages Aug 11 19:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 132f4c39-1cb8-880d-7e4c-e4771d0a3e97 (at 10.8.20.22@o2ib6) Aug 11 19:41:29 fir-md1-s1 kernel: Lustre: Skipped 15795 previous similar messages Aug 11 19:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client fb4eab6b-7253-4fde-536d-07e03dd4756a (at 10.8.21.35@o2ib6) reconnecting Aug 11 19:41:29 fir-md1-s1 kernel: Lustre: Skipped 11683 previous similar messages Aug 11 19:41:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2af3b0c8-7f33-1b36-e691-19b3060df1cb (at 10.8.7.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f399a36d400, cur 1565577700 expire 1565577550 last 1565577473 Aug 11 19:41:40 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 11 19:42:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:42:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:42:51 fir-md1-s1 kernel: LNet: Service thread pid 23685 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:42:51 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 11 19:42:51 fir-md1-s1 kernel: Pid: 23685, comm: mdt02_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:42:51 fir-md1-s1 kernel: Call Trace: Aug 11 19:42:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:42:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:42:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:42:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:42:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:42:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:42:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:42:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:42:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:42:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577771.23685 Aug 11 19:43:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:43:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.15.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:43:21 fir-md1-s1 kernel: Pid: 23728, comm: mdt02_086 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:43:21 fir-md1-s1 kernel: Call Trace: Aug 11 19:43:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:43:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:43:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:43:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:43:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:43:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:43:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.67@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f286100e300/0x5d9ee6c502f91214 lrc: 4/0,0 mode: PR/PR res: [0x2c0024246:0xe:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.0.67@o2ib6 remote: 0xb7325a020ca22ecb expref: 334 pid: 21181 timeout: 4692861 lvb_type: 0 Aug 11 19:43:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 31 previous similar messages Aug 11 19:43:21 fir-md1-s1 kernel: LustreError: 20378:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0b91a5fb00 x1636761867804208/t0(0) o104->fir-MDT0002@10.8.0.67@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 11 19:43:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:43:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:43:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:43:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577801.23728 Aug 11 19:43:36 fir-md1-s1 kernel: LNet: Service thread pid 50581 completed after 647.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:43:36 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 11 19:44:36 fir-md1-s1 kernel: LNet: Service thread pid 20731 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:44:36 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 11 19:44:36 fir-md1-s1 kernel: Pid: 20731, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:44:36 fir-md1-s1 kernel: Call Trace: Aug 11 19:44:36 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:44:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:44:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:44:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:44:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577876.20731 Aug 11 19:45:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.18@o2ib6, removing former export from same NID Aug 11 19:45:01 fir-md1-s1 kernel: Lustre: Skipped 2983 previous similar messages Aug 11 19:45:30 fir-md1-s1 kernel: LNet: Service thread pid 23727 completed after 671.30s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:45:30 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 11 19:46:13 fir-md1-s1 kernel: Pid: 27319, comm: mdt00_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:46:13 fir-md1-s1 kernel: Call Trace: Aug 11 19:46:13 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_create+0x569/0x1090 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 11 19:46:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:46:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:46:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:46:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:46:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565577973.27319 Aug 11 19:46:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 05c8b6b2-04ac-c002-5530-092914937d78 (at 10.8.1.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e3f3e8000, cur 1565577987 expire 1565577837 last 1565577760 Aug 11 19:46:27 fir-md1-s1 kernel: Lustre: Skipped 100 previous similar messages Aug 11 19:46:36 fir-md1-s1 kernel: LNet: Service thread pid 50444 completed after 552.26s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:46:36 fir-md1-s1 kernel: Lustre: 23647:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (557:1s); client may timeout. req@ffff8f2c34ef4200 x1641142078638880/t0(0) o101->953cdd8f-f1b4-74a1-72f4-aa3c283ed414@10.9.105.24@o2ib4:18/0 lens 584/536 e 0 to 0 dl 1565577995 ref 1 fl Complete:/0/0 rc 0/0 Aug 11 19:46:36 fir-md1-s1 kernel: Lustre: 23647:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 14 previous similar messages Aug 11 19:46:36 fir-md1-s1 kernel: LNet: Skipped 79 previous similar messages Aug 11 19:47:02 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-6), not sending early reply req@ffff8f322bddfb00 x1638092137082560/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:6/0 lens 584/3264 e 0 to 0 dl 1565578026 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:47:02 fir-md1-s1 kernel: Lustre: 23077:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 140 previous similar messages Aug 11 19:47:49 fir-md1-s1 kernel: LNet: Service thread pid 23674 was inactive for 200.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:47:49 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 11 19:47:49 fir-md1-s1 kernel: Pid: 23674, comm: mdt03_080 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:47:49 fir-md1-s1 kernel: Call Trace: Aug 11 19:47:49 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:47:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:47:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:47:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:47:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578069.23674 Aug 11 19:49:49 fir-md1-s1 kernel: LNet: Service thread pid 23555 completed after 1125.27s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:49:57 fir-md1-s1 kernel: Pid: 25675, comm: mdt02_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:49:57 fir-md1-s1 kernel: Call Trace: Aug 11 19:49:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:49:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:49:57 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:49:57 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:49:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:49:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:49:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:49:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:49:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:49:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578197.25675 Aug 11 19:51:02 fir-md1-s1 kernel: Pid: 10364, comm: mdt03_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:51:02 fir-md1-s1 kernel: Call Trace: Aug 11 19:51:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:51:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:51:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:51:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:51:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:51:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:51:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:51:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:51:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:51:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578262.10364 Aug 11 19:51:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 89f18c45-2db3-8ed2-9878-a7776bbc1369 (at 10.8.26.18@o2ib6) reconnecting Aug 11 19:51:29 fir-md1-s1 kernel: Lustre: Skipped 12022 previous similar messages Aug 11 19:51:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8cd7f013-9a17-1cbd-e8a4-7acaf2642e05 (at 10.8.26.18@o2ib6) Aug 11 19:51:29 fir-md1-s1 kernel: Lustre: Skipped 15032 previous similar messages Aug 11 19:51:45 fir-md1-s1 kernel: LustreError: 21455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565578215, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1df63c45c0/0x5d9ee6c5070333c1 lrc: 3/1,0 mode: --/PR res: [0x200029f1b:0x5a12:0x0].0x0 bits 0x13/0x0 rrc: 336 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21455 timeout: 0 lvb_type: 0 Aug 11 19:51:45 fir-md1-s1 kernel: LustreError: 21455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 11 19:53:31 fir-md1-s1 kernel: LNet: Service thread pid 23626 was inactive for 200.40s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 11 19:53:31 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 11 19:53:31 fir-md1-s1 kernel: Pid: 23626, comm: mdt03_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:53:31 fir-md1-s1 kernel: Call Trace: Aug 11 19:53:31 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 11 19:53:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:53:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:53:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:53:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:53:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578411.23626 Aug 11 19:53:36 fir-md1-s1 kernel: Pid: 21455, comm: mdt01_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:53:36 fir-md1-s1 kernel: Call Trace: Aug 11 19:53:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:53:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:53:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:53:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:53:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:53:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:53:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:53:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:53:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:53:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578416.21455 Aug 11 19:53:55 fir-md1-s1 kernel: Pid: 23567, comm: mdt00_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:53:55 fir-md1-s1 kernel: Call Trace: Aug 11 19:53:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:53:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:53:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:53:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:53:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:53:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:53:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:53:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:53:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:53:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578435.23567 Aug 11 19:53:55 fir-md1-s1 kernel: LNet: Service thread pid 23747 was inactive for 200.65s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 11 19:53:55 fir-md1-s1 kernel: LNet: Skipped 105 previous similar messages Aug 11 19:53:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578437.21422 Aug 11 19:53:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578438.21461 Aug 11 19:53:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578439.24577 Aug 11 19:54:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578440.50444 Aug 11 19:54:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578441.23616 Aug 11 19:54:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578444.21415 Aug 11 19:54:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578447.20541 Aug 11 19:54:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578448.21332 Aug 11 19:54:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578449.97640 Aug 11 19:54:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578452.21416 Aug 11 19:54:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578454.23729 Aug 11 19:54:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578456.21003 Aug 11 19:54:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578460.10502 Aug 11 19:54:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578464.97660 Aug 11 19:54:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578465.50582 Aug 11 19:54:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578466.21669 Aug 11 19:54:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578467.23749 Aug 11 19:54:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578470.23636 Aug 11 19:54:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578471.23584 Aug 11 19:54:32 fir-md1-s1 kernel: LNet: Service thread pid 20731 completed after 796.32s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 11 19:54:32 fir-md1-s1 kernel: LNet: Skipped 38 previous similar messages Aug 11 19:54:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578476.10363 Aug 11 19:54:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578482.23653 Aug 11 19:54:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578483.27316 Aug 11 19:54:58 fir-md1-s1 kernel: Pid: 97648, comm: mdt01_087 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:54:58 fir-md1-s1 kernel: Call Trace: Aug 11 19:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:54:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:54:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:54:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578498.97648 Aug 11 19:54:59 fir-md1-s1 kernel: Pid: 20460, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:54:59 fir-md1-s1 kernel: Call Trace: Aug 11 19:54:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:54:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:54:59 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:54:59 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:54:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:54:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:54:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:54:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:54:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:54:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578499.20460 Aug 11 19:55:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.24.5@o2ib6, removing former export from same NID Aug 11 19:55:01 fir-md1-s1 kernel: Lustre: Skipped 2706 previous similar messages Aug 11 19:55:10 fir-md1-s1 kernel: Pid: 23662, comm: mdt03_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:55:10 fir-md1-s1 kernel: Call Trace: Aug 11 19:55:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:55:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:55:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:55:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:55:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:55:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:55:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:55:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:55:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:55:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578510.23662 Aug 11 19:55:12 fir-md1-s1 kernel: Pid: 20983, comm: mdt00_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:55:12 fir-md1-s1 kernel: Call Trace: Aug 11 19:55:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:55:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:55:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:55:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:55:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:55:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:55:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:55:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:55:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:55:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578512.20983 Aug 11 19:55:24 fir-md1-s1 kernel: Pid: 24580, comm: mdt01_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 11 19:55:24 fir-md1-s1 kernel: Call Trace: Aug 11 19:55:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 11 19:55:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 11 19:55:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 11 19:55:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 11 19:55:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 11 19:55:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 11 19:55:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 11 19:55:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 11 19:55:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 11 19:55:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578524.24580 Aug 11 19:55:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578532.97669 Aug 11 19:55:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578533.23660 Aug 11 19:55:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578535.97649 Aug 11 19:55:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578538.22283 Aug 11 19:55:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578543.23602 Aug 11 19:55:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578548.23619 Aug 11 19:55:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578552.23706 Aug 11 19:55:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578558.23579 Aug 11 19:56:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578562.23588 Aug 11 19:56:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 9b9b8332-39fb-197d-4c4c-38d36ae981cd (at 10.8.17.2@o2ib6) in 220 seconds. I think it's dead, and I am evicting it. exp ffff8f0ed9e8cc00, cur 1565578563 expire 1565578413 last 1565578343 Aug 11 19:56:03 fir-md1-s1 kernel: Lustre: Skipped 244 previous similar messages Aug 11 19:56:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578565.23647 Aug 11 19:56:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578567.23597 Aug 11 19:56:13 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 11 19:56:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578573.24587 Aug 11 19:56:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578574.27320 Aug 11 19:56:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578580.21671 Aug 11 19:56:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 19:57:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578626.23713 Aug 11 19:57:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565578633.23690 Aug 11 19:57:28 fir-md1-s1 kernel: Lustre: 21322:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-6), not sending early reply req@ffff8f2d0aced400 x1631549085687088/t0(0) o101->214bcacf-deef-8b1a-7220-98313adef1de@10.9.102.36@o2ib4:2/0 lens 584/3264 e 0 to 0 dl 1565578652 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 19:57:28 fir-md1-s1 kernel: Lustre: 21322:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 64 previous similar messages Aug 11 19:57:50 fir-md1-s1 kernel: Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565578663/real 0] req@ffff8f37e5e6e900 x1636761872464032/t0(0) o104->fir-MDT0002@10.8.8.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565578670 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 19:57:50 fir-md1-s1 kernel: Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 11 19:58:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.33@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2adb2c3600/0x5d9ee6c4f70f8ffb lrc: 4/0,0 mode: PR/PR res: [0x2c0001757:0xfb:0x0].0x0 bits 0x5b/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.8.33@o2ib6 remote: 0x96e5fdb266fbbc14 expref: 64 pid: 10148 timeout: 4693752 lvb_type: 0 Aug 11 19:58:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Aug 11 20:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 20:02:54 fir-md1-s1 kernel: Lustre: Skipped 8406 previous similar messages Aug 11 20:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 20:02:54 fir-md1-s1 kernel: Lustre: Skipped 11321 previous similar messages Aug 11 20:05:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 20:05:05 fir-md1-s1 kernel: Lustre: Skipped 1603 previous similar messages Aug 11 20:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 20:13:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 20:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 20:13:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 20:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f31e32c1-64a8-2ca4-c2cc-5aff17edf210 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d7bfff800, cur 1565579603 expire 1565579453 last 1565579376 Aug 11 20:13:23 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 11 20:16:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 20:23:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 11 20:23:34 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 11 20:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 20:24:06 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 20:31:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 20:31:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 20:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 20:33:59 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 20:34:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 20:34:31 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 20:36:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1fd70a25-0527-c4e0-9f3a-36b5a12cb9c6 (at 10.9.103.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f05eb8800, cur 1565581002 expire 1565580852 last 1565580775 Aug 11 20:36:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 20:43:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 20:43:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 20:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 20:44:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 20:44:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 20:44:54 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 11 20:47:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client efacc091-5bcf-c119-5aa7-9803385d714e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f056843d800, cur 1565581647 expire 1565581497 last 1565581420 Aug 11 20:47:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 20:54:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 20:54:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 20:54:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 20:54:17 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 20:56:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 20:56:02 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 21:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 941c97b1-12c0-19cf-cacc-23ad3dedd4d4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f214f20f400, cur 1565582402 expire 1565582252 last 1565582175 Aug 11 21:00:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 21:05:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 21:05:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 21:06:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 21:06:08 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 21:15:46 fir-md1-s1 kernel: Lustre: 24576:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565583339/real 1565583339] req@ffff8f20bba29200 x1636761911382576/t0(0) o104->fir-MDT0000@10.9.103.19@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565583346 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 21:15:54 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18b2823f00 x1638092142731184/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:29/0 lens 504/2888 e 1 to 0 dl 1565583359 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 21:15:54 fir-md1-s1 kernel: Lustre: 97669:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Aug 11 21:16:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 21:16:00 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 21:16:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bb8c2ece-f417-e8df-48b9-34282257b797 (at 10.9.104.28@o2ib4) reconnecting Aug 11 21:16:21 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 21:17:03 fir-md1-s1 kernel: Lustre: 24576:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565583416/real 1565583416] req@ffff8f20bba29200 x1636761911382576/t0(0) o104->fir-MDT0000@10.9.103.19@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565583423 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 21:17:03 fir-md1-s1 kernel: Lustre: 24576:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Aug 11 21:17:37 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565583367, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1d0ec58d80/0x5d9ee6c5281fa2b9 lrc: 3/1,0 mode: --/PR res: [0x200029976:0x679:0x0].0x0 bits 0x13/0x0 rrc: 87 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24580 timeout: 0 lvb_type: 0 Aug 11 21:17:37 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 62 previous similar messages Aug 11 21:18:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 21:18:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 21:18:13 fir-md1-s1 kernel: LustreError: 24576:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.103.19@o2ib4) failed to reply to blocking AST (req@ffff8f20bba29200 x1636761911382576 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f17d41569c0/0x5d9ee6c526ab220d lrc: 4/0,0 mode: PR/PR res: [0x200029b2a:0x772:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.9.103.19@o2ib4 remote: 0xa699fcbb581ea9b0 expref: 927 pid: 21672 timeout: 4698695 lvb_type: 0 Aug 11 21:18:13 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.103.19@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 11 21:18:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.103.19@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f17d41569c0/0x5d9ee6c526ab220d lrc: 3/0,0 mode: PR/PR res: [0x200029b2a:0x772:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.9.103.19@o2ib4 remote: 0xa699fcbb581ea9b0 expref: 928 pid: 21672 timeout: 0 lvb_type: 0 Aug 11 21:18:13 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Aug 11 21:18:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5b8de60b-3d0b-a077-a95b-7f9e0c4afce3 (at 10.9.103.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31343ed400, cur 1565583532 expire 1565583382 last 1565583305 Aug 11 21:18:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 21:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 21:26:32 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Aug 11 21:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 21:26:32 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 11 21:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d3fdc2ac-e143-5817-9bf3-9bce7cd857e0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2647534000, cur 1565584090 expire 1565583940 last 1565583863 Aug 11 21:28:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 21:31:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 21:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 21:37:02 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 21:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 21:37:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 11 21:47:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 21:47:36 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 21:47:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 21:47:36 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 21:48:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 21:50:23 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565585416/real 1565585416] req@ffff8f1bded4b600 x1636761934777280/t0(0) o104->fir-MDT0000@10.9.104.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565585423 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 21:50:23 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Aug 11 21:50:41 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f285cdc9200 x1638250252012432/t0(0) o36->83b4afa2-a367-a71c-8602-481ad43297ce@10.8.0.68@o2ib6:16/0 lens 512/2888 e 0 to 0 dl 1565585446 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 21:50:41 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 11 21:50:44 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565585437/real 1565585437] req@ffff8f1bded4b600 x1636761934777280/t0(0) o104->fir-MDT0000@10.9.104.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565585444 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 21:50:44 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 11 21:51:02 fir-md1-s1 kernel: Lustre: 23622:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2ced17da00 x1640601584356608/t0(0) o101->f18f5844-4ec0-3cde-21e1-0f1a02440d5a@10.8.17.1@o2ib6:7/0 lens 584/3264 e 0 to 0 dl 1565585467 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 21:51:26 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565585479/real 1565585479] req@ffff8f1bded4b600 x1636761934777280/t0(0) o104->fir-MDT0000@10.9.104.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565585486 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 21:51:26 fir-md1-s1 kernel: Lustre: 22286:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 11 21:51:54 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2236cf0f00 x1641512181011152/t0(0) o101->f3635142-f27f-1f62-11cd-66dce195842b@10.8.21.17@o2ib6:29/0 lens 584/3264 e 0 to 0 dl 1565585519 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 21:51:54 fir-md1-s1 kernel: Lustre: 22279:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 11 21:52:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d8928ea-31c5-fc78-6bbb-87f37bff639b (at 10.9.104.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f135b970c00, cur 1565585523 expire 1565585373 last 1565585296 Aug 11 21:52:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 21:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 21:57:49 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 21:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 21:57:49 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 11 21:58:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 22:02:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 22:10:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 22:10:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 11 22:10:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 22:10:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 22:17:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 641a88e0-ad4f-686f-ac4f-e86e08622171 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a2e695400, cur 1565587058 expire 1565586908 last 1565586831 Aug 11 22:17:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 22:18:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 22:20:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 11 22:20:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 22:20:47 fir-md1-s1 kernel: Lustre: 23626:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565587239/real 1565587239] req@ffff8f3942af1500 x1636761950317936/t0(0) o104->fir-MDT0000@10.8.1.32@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565587246 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 11 22:20:47 fir-md1-s1 kernel: Lustre: 23626:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 11 22:20:54 fir-md1-s1 kernel: Lustre: 10363:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e76113600 x1638092147080016/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:29/0 lens 504/2888 e 1 to 0 dl 1565587259 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 22:20:54 fir-md1-s1 kernel: Lustre: 10363:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 11 22:21:00 fir-md1-s1 kernel: Lustre: 23560:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f320e39f500 x1631550124032608/t0(0) o101->98a2e267-7ec4-26e6-8e49-234410a6b030@10.9.108.35@o2ib4:5/0 lens 576/3264 e 1 to 0 dl 1565587265 ref 2 fl Interpret:/0/0 rc 0/0 Aug 11 22:21:00 fir-md1-s1 kernel: Lustre: 23560:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 11 22:21:01 fir-md1-s1 kernel: Lustre: 23626:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565587254/real 1565587254] req@ffff8f3942af1500 x1636761950317936/t0(0) o104->fir-MDT0000@10.8.1.32@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565587261 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 11 22:21:01 fir-md1-s1 kernel: Lustre: 23626:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 11 22:21:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bb8c2ece-f417-e8df-48b9-34282257b797 (at 10.9.104.28@o2ib4) reconnecting Aug 11 22:21:01 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 11 22:21:15 fir-md1-s1 kernel: LustreError: 23626:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.1.32@o2ib6) failed to reply to blocking AST (req@ffff8f3942af1500 x1636761950317936 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f19cfb621c0/0x5d9ee6c53f017a0b lrc: 4/0,0 mode: PR/PR res: [0x200029dfb:0x15a5:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.1.32@o2ib6 remote: 0x4c27fb9aa310857 expref: 702 pid: 22279 timeout: 4702357 lvb_type: 0 Aug 11 22:21:15 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.1.32@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 11 22:21:15 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.8.1.32@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f19cfb621c0/0x5d9ee6c53f017a0b lrc: 3/0,0 mode: PR/PR res: [0x200029dfb:0x15a5:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.1.32@o2ib6 remote: 0x4c27fb9aa310857 expref: 703 pid: 22279 timeout: 0 lvb_type: 0 Aug 11 22:23:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7d0688b3-8792-1306-7035-fa281876a9e0 (at 10.8.1.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3393ee8400, cur 1565587400 expire 1565587250 last 1565587173 Aug 11 22:23:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 22:25:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 22:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 22:34:24 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 11 22:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 22:34:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 11 22:35:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 11 22:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7eb00fb7-64a5-e8d8-573f-7b3f9c3839b7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26a36adc00, cur 1565588184 expire 1565588034 last 1565587957 Aug 11 22:36:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 22:38:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 22:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1c6ce9ef-52f4-c6fc-e85c-db3cab3a39e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1df2571000, cur 1565588490 expire 1565588340 last 1565588263 Aug 11 22:41:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 22:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dbabace0-0f60-e84d-bdf8-bc114da448e3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f73a8f400, cur 1565588723 expire 1565588573 last 1565588496 Aug 11 22:45:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 22:45:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 11 22:45:24 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 11 22:46:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 22:46:16 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 11 22:46:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 11 22:55:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 22:55:47 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 22:57:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 22:57:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 11 23:05:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 23:05:49 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 23:07:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bcba538b-a802-8154-b625-5c30bf1d04d5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22ec782c00, cur 1565590031 expire 1565589881 last 1565589804 Aug 11 23:07:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 23:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 23:07:19 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 11 23:09:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 11 23:11:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a22eae11-7ce6-d316-159f-602eb6cb75c6 (at 10.9.102.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45340cfc00, cur 1565590314 expire 1565590164 last 1565590087 Aug 11 23:11:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 23:15:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 11 23:15:52 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 11 23:15:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4387ce26-fd99-eebc-cf47-a5920ed7d178 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2c48a10800, cur 1565590559 expire 1565590409 last 1565590332 Aug 11 23:15:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 23:17:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 23:17:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 11 23:19:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 70bb7838-d7fb-8365-52c8-dbb835f32b8b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdb0c400, cur 1565590779 expire 1565590629 last 1565590552 Aug 11 23:19:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 23:20:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 11 23:23:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 63f8f5e2-30f0-9946-2a19-5aa843433ec2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3946f25000, cur 1565590997 expire 1565590847 last 1565590770 Aug 11 23:23:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 11 23:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 16254501-39d9-c8c5-f6a9-5126c9fb6600 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22b0f58800, cur 1565591010 expire 1565590860 last 1565590783 Aug 11 23:23:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 11 23:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 23:29:37 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 11 23:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 23:29:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 11 23:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5af6e3ff-5637-2e44-92d8-cfefce7c67cc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cd6a80400, cur 1565591459 expire 1565591309 last 1565591232 Aug 11 23:32:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 23:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33fc6e5a-5ae1-932d-89c0-98a152a4edaa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e06bd2000, cur 1565591698 expire 1565591548 last 1565591471 Aug 11 23:34:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 11 23:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 23:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 11 23:39:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 11 23:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 11 23:39:43 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 23:41:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 11 23:50:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 11 23:50:21 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 11 23:50:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 11 23:50:21 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 11 23:56:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 00:00:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 00:00:24 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 00:00:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 00:00:24 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 00:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7e6f5dcf-609e-2ea1-2f62-974e501f3bca (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3186895c00, cur 1565593307 expire 1565593157 last 1565593080 Aug 12 00:01:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:02:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 00:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d1177dee-a231-9127-f2bd-ed669a267a77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1779beb800, cur 1565593641 expire 1565593491 last 1565593414 Aug 12 00:07:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:09:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 00:10:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 00:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 00:10:27 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 00:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 00:10:27 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 00:10:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 00:12:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 00:13:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 00:18:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 00:21:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 00:21:01 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 00:21:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 00:21:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 00:23:31 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565594604/real 1565594604] req@ffff8f16c3d48600 x1636762059199184/t0(0) o104->fir-MDT0000@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565594611 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 00:23:31 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 12 00:23:38 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565594611/real 1565594611] req@ffff8f16c3d48600 x1636762059199184/t0(0) o104->fir-MDT0000@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565594618 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 00:23:39 fir-md1-s1 kernel: Lustre: 20460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f217896b300 x1638239215289504/t0(0) o101->6bb1b23c-28f8-153d-8cc1-2ff0115f9167@10.9.106.58@o2ib4:14/0 lens 1800/3288 e 1 to 0 dl 1565594624 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 00:23:39 fir-md1-s1 kernel: Lustre: 20460:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 12 00:23:45 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565594618/real 1565594618] req@ffff8f16c3d48600 x1636762059199184/t0(0) o104->fir-MDT0000@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565594625 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 00:23:46 fir-md1-s1 kernel: Lustre: 10150:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f44fb742100 x1631554145116208/t0(0) o101->8677433a-08df-e12f-9cbe-ab844f71c9a4@10.9.106.69@o2ib4:21/0 lens 576/3264 e 1 to 0 dl 1565594631 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 00:23:59 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565594632/real 1565594632] req@ffff8f16c3d48600 x1636762059199184/t0(0) o104->fir-MDT0000@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565594639 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 00:23:59 fir-md1-s1 kernel: Lustre: 22287:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 12 00:23:59 fir-md1-s1 kernel: LustreError: 22287:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.17.20@o2ib6) failed to reply to blocking AST (req@ffff8f16c3d48600 x1636762059199184 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2fb8cd0480/0x5d9ee6c5871280a8 lrc: 4/0,0 mode: PR/PR res: [0x2000297b6:0x1ace:0x0].0x0 bits 0x13/0x0 rrc: 84 type: IBT flags: 0x60200400000020 nid: 10.8.17.20@o2ib6 remote: 0xd90ca50174377abc expref: 388 pid: 23597 timeout: 4709721 lvb_type: 0 Aug 12 00:23:59 fir-md1-s1 kernel: LustreError: 22287:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages Aug 12 00:23:59 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.17.20@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 12 00:23:59 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 12 00:23:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.17.20@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2fb8cd0480/0x5d9ee6c5871280a8 lrc: 3/0,0 mode: PR/PR res: [0x2000297b6:0x1ace:0x0].0x0 bits 0x13/0x0 rrc: 84 type: IBT flags: 0x60200400000020 nid: 10.8.17.20@o2ib6 remote: 0xd90ca50174377abc expref: 389 pid: 23597 timeout: 0 lvb_type: 0 Aug 12 00:26:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a9534d35-abb5-3045-b46f-82a5a3c25826 (at 10.8.17.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d54623400, cur 1565594806 expire 1565594656 last 1565594579 Aug 12 00:26:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:29:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dcef3bfd-75aa-db79-d636-f4833f39faeb (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15a70f4c00, cur 1565594970 expire 1565594820 last 1565594743 Aug 12 00:29:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 00:30:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 00:31:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 00:31:44 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 00:31:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 00:31:44 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 00:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ff6bc2ff-0593-f401-a88d-8eb18377ee77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2af7655800, cur 1565595584 expire 1565595434 last 1565595357 Aug 12 00:39:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:42:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 00:42:02 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 00:42:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 00:42:02 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 00:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4630438c-f883-172a-9db5-188725180f03 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22abcb3c00, cur 1565595960 expire 1565595810 last 1565595733 Aug 12 00:46:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:51:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07e3e9f3-2c07-2e5c-ea6d-eca046cdd69f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af5fec000, cur 1565596299 expire 1565596149 last 1565596072 Aug 12 00:51:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:53:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 00:53:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 00:53:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 00:53:00 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 00:54:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client efe6fb3b-95a0-0e3c-717b-65215d3d723b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25013df800, cur 1565596466 expire 1565596316 last 1565596239 Aug 12 00:54:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 00:59:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 01:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 01:03:05 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 01:03:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 01:03:05 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 01:07:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 01:07:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 12 01:13:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 01:13:42 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 01:13:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 01:13:42 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 01:15:41 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 01:18:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 01:18:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 12 01:23:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf0430ba-095a-6a91-d11f-2c9656918777 (at 10.8.27.11@o2ib6) reconnecting Aug 12 01:23:44 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 01:23:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1bab7781-4567-af52-5c6c-b7f8f6ece810 (at 10.8.27.11@o2ib6) Aug 12 01:23:44 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 01:23:47 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 01:23:47 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Aug 12 01:23:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.21@o2ib6, removing former export from same NID Aug 12 01:23:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID Aug 12 01:23:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 01:23:55 fir-md1-s1 kernel: Lustre: 23695:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565598228/real 0] req@ffff8f2e6746bf00 x1636762099584256/t0(0) o104->fir-MDT0000@10.8.7.21@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565598235 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 01:23:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.20.14@o2ib6, removing former export from same NID Aug 12 01:23:55 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 12 01:23:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.30@o2ib6, removing former export from same NID Aug 12 01:23:57 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 12 01:23:58 fir-md1-s1 kernel: Lustre: 23567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565598231/real 0] req@ffff8f062b530600 x1636762099609376/t0(0) o104->fir-MDT0000@10.8.21.16@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565598238 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 01:23:58 fir-md1-s1 kernel: Lustre: 23567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 12 01:23:58 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f2e227b7c50 x1631642673253632/t0(0) o4->e18301fc-f860-0db4-bf24-6c606e0cc839@10.8.8.31@o2ib6:18/0 lens 488/448 e 0 to 0 dl 1565598258 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:23:58 fir-md1-s1 kernel: LustreError: 21987:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Aug 12 01:24:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.33@o2ib6, removing former export from same NID Aug 12 01:24:01 fir-md1-s1 kernel: Lustre: Skipped 87 previous similar messages Aug 12 01:24:03 fir-md1-s1 kernel: Lustre: 23749:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565598236/real 0] req@ffff8f2aef69bc00 x1636762099658640/t0(0) o104->fir-MDT0000@10.8.1.34@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565598243 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 01:24:03 fir-md1-s1 kernel: Lustre: 23749:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 12 01:24:06 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325b54c200 Aug 12 01:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6), client will retry: rc = -110 Aug 12 01:24:06 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f267890c600 Aug 12 01:24:06 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 01:24:07 fir-md1-s1 kernel: LustreError: 46513:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2b23b57450 x1641547515410032/t0(0) o3->373ffde0-f667-d416-305b-48e8d6373c5a@10.8.30.10@o2ib6:20/0 lens 488/440 e 0 to 0 dl 1565598260 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:07 fir-md1-s1 kernel: LustreError: 46513:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 8 previous similar messages Aug 12 01:24:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.31@o2ib6, removing former export from same NID Aug 12 01:24:09 fir-md1-s1 kernel: Lustre: Skipped 84 previous similar messages Aug 12 01:24:10 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1c8a102c00 Aug 12 01:24:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 46725c7e-13ed-427c-fac8-b2b98cb851a6 (at 10.8.17.12@o2ib6), client will retry: rc = -110 Aug 12 01:24:10 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3175ea2600 Aug 12 01:24:10 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f90414800 Aug 12 01:24:10 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ba8494600 Aug 12 01:24:11 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f90a70000 Aug 12 01:24:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ac4e42b8-5648-2511-97b0-70a975af15db (at 10.8.30.18@o2ib6), client will retry: rc -110 Aug 12 01:24:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 01:24:12 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e16c29000 Aug 12 01:24:12 fir-md1-s1 kernel: Lustre: 21430:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20d9ac9500 x1638890723410480/t0(0) o101->534e10c9-e8b6-b009-609a-c6de708bb45f@10.8.27.35@o2ib6:17/0 lens 512/568 e 0 to 0 dl 1565598257 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:13 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3036e9e300 x1640773011829248/t0(0) o101->05bc5852-3091-f0d1-f1b3-97406dff981f@10.9.103.16@o2ib4:18/0 lens 1792/3288 e 0 to 0 dl 1565598258 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:15 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565598248/real 0] req@ffff8f1881cec800 x1636762099740368/t0(0) o104->fir-MDT0000@10.8.0.68@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565598255 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 01:24:15 fir-md1-s1 kernel: Lustre: 26258:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 12 01:24:15 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2e01ddd100 x1631609578841600/t0(0) o101->da2044d0-4d1f-46be-9f3b-250354ced4dc@10.9.106.2@o2ib4:20/0 lens 1792/3288 e 0 to 0 dl 1565598260 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:15 fir-md1-s1 kernel: Lustre: 23743:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 12 01:24:17 fir-md1-s1 kernel: Lustre: 6550:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2959d36850 x1635360856865024/t0(0) o3->524c99bf-b747-74c3-31a4-4ee55c54fd9b@10.8.26.30@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1565598262 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:17 fir-md1-s1 kernel: Lustre: 6550:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 12 01:24:21 fir-md1-s1 kernel: Lustre: 49465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d0ec70850 x1641511671378928/t0(0) o3->e9189573-a0e4-5695-5b28-6064c28f210e@10.8.21.16@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1565598266 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:21 fir-md1-s1 kernel: Lustre: 49465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Aug 12 01:24:22 fir-md1-s1 kernel: LustreError: 46590:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 30+0s req@ffff8f25313ef050 x1631709475467184/t0(0) o4->f8938193-b6f4-691f-a9ed-5d03b37d98de@10.8.30.11@o2ib6:22/0 lens 488/448 e 0 to 0 dl 1565598262 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:22 fir-md1-s1 kernel: LustreError: 46590:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Aug 12 01:24:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.24@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f16b1f58b40/0x5d9ee6c5a6d6f2c0 lrc: 4/0,0 mode: PR/PR res: [0x2c002c7e8:0x63b:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.24@o2ib6 remote: 0x8407f27b36f4dd66 expref: 7178 pid: 20734 timeout: 4713323 lvb_type: 0 Aug 12 01:24:24 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f336af97200 Aug 12 01:24:24 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b56a94e00 Aug 12 01:24:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f8938193-b6f4-691f-a9ed-5d03b37d98de (at 10.8.30.11@o2ib6), client will retry: rc = -110 Aug 12 01:24:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 01:24:24 fir-md1-s1 kernel: Lustre: 46551:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f2f21c2d050 x1631709475467168/t0(0) o4->f8938193-b6f4-691f-a9ed-5d03b37d98de@10.8.30.11@o2ib6:22/0 lens 488/448 e 0 to 0 dl 1565598262 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 12 01:24:24 fir-md1-s1 kernel: Lustre: 46551:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 12 01:24:25 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2aaa46ee00 Aug 12 01:24:25 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.25@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f34d56f4380/0x5d9ee6c5a6f56000 lrc: 4/0,0 mode: PR/PR res: [0x2c002c57b:0x19fdd:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x60200400000020 nid: 10.8.8.25@o2ib6 remote: 0xd3550bd18810391c expref: 19758 pid: 23745 timeout: 4713325 lvb_type: 0 Aug 12 01:24:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.30.32@o2ib6, removing former export from same NID Aug 12 01:24:26 fir-md1-s1 kernel: Lustre: Skipped 365 previous similar messages Aug 12 01:24:26 fir-md1-s1 kernel: LustreError: 48198:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2d0ec70850 x1641511671378928/t0(0) o3->e9189573-a0e4-5695-5b28-6064c28f210e@10.8.21.16@o2ib6:26/0 lens 488/440 e 0 to 0 dl 1565598266 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:26 fir-md1-s1 kernel: LustreError: 48198:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 9 previous similar messages Aug 12 01:24:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 40s: evicting client at 10.8.27.35@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f18f2929b00/0x5d9ee6c5a6fa2970 lrc: 4/0,0 mode: PR/PR res: [0x2c002c82b:0x19a36:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.35@o2ib6 remote: 0xf689582740c10a40 expref: 103987 pid: 23745 timeout: 4713328 lvb_type: 0 Aug 12 01:24:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Aug 12 01:24:30 fir-md1-s1 kernel: LustreError: 46564:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f25313ed850 x1631710251983744/t0(0) o3->2da7ed9b-a80c-b1ee-6b0b-514ba4c7a01e@10.8.30.32@o2ib6:25/0 lens 488/440 e 0 to 0 dl 1565598295 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:30 fir-md1-s1 kernel: LustreError: 46564:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Aug 12 01:24:30 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2ea1813e00 Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 401116e7-2bba-1e71-6be4-4599d07f8edd (at 10.8.18.14@o2ib6), client will retry: rc -110 Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:7s); client may timeout. req@ffff8f2f2eeb3850 x1636354012392944/t0(0) o3->401116e7-2bba-1e71-6be4-4599d07f8edd@10.8.18.14@o2ib6:23/0 lens 488/440 e 0 to 0 dl 1565598263 ref 1 fl Complete:/0/ffffffff rc -110/-1 Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: 49463:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2959d37c50 x1634521297170720/t0(0) o3->7a7a90f2-46dd-49dc-cc68-9ea5ca5dbef1@10.8.13.5@o2ib6:5/0 lens 488/440 e 0 to 0 dl 1565598275 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:30 fir-md1-s1 kernel: Lustre: 46516:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Aug 12 01:24:32 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3280ca9400 Aug 12 01:24:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with d22c54d9-f3ee-e6f8-f34c-cd9ceccbd787 (at 10.8.2.24@o2ib6), client will retry: rc = -110 Aug 12 01:24:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 01:24:33 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28606c9600 Aug 12 01:24:34 fir-md1-s1 kernel: Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1565598267/real 0] req@ffff8f3cd0d33300 x1636762099902928/t0(0) o106->fir-MDT0000@10.8.0.67@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565598274 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 01:24:34 fir-md1-s1 kernel: Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 12 01:24:35 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f2e941c0c50 x1635711220743504/t0(0) o3->019fb44b-9b86-d1a1-a118-1a686bd2a9e3@10.8.18.4@o2ib6:5/0 lens 488/440 e 0 to 0 dl 1565598275 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:35 fir-md1-s1 kernel: LustreError: 24567:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 12 previous similar messages Aug 12 01:24:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.68@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f20051418c0/0x5d9ee6c5a6e1b532 lrc: 4/0,0 mode: PW/PW res: [0x20002a014:0x7f56:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x60200400010020 nid: 10.8.0.68@o2ib6 remote: 0xd3d41a4760673179 expref: 118 pid: 24580 timeout: 4713337 lvb_type: 0 Aug 12 01:24:38 fir-md1-s1 kernel: LustreError: 23751:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f20e9632800 ns: mdt-fir-MDT0002_UUID lock: ffff8f2c23c33600/0x5d9ee6c5a7025b78 lrc: 3/0,0 mode: PW/PW res: [0x2c002c7e8:0x63b:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.8.24@o2ib6 remote: 0x8407f27b36f4f10f expref: 2 pid: 23751 timeout: 0 lvb_type: 0 Aug 12 01:24:38 fir-md1-s1 kernel: Lustre: 23751:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:14s); client may timeout. req@ffff8f27f9183000 x1641144983064272/t0(0) o101->5d88905c-3620-0c9d-fbd0-b522de002dc3@10.8.8.24@o2ib6:24/0 lens 480/536 e 0 to 0 dl 1565598264 ref 1 fl Complete:/0/0 rc -107/-107 Aug 12 01:24:38 fir-md1-s1 kernel: Lustre: 23751:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 12 01:24:38 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f19ee6cd000 Aug 12 01:24:39 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20f72a6600 Aug 12 01:24:39 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33e2ab3800 Aug 12 01:24:43 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f159bb09a00 Aug 12 01:24:45 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d51b1ee00 Aug 12 01:24:45 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f6184d000 Aug 12 01:24:47 fir-md1-s1 kernel: Lustre: 27583:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f25313efc50 x1631561966283648/t0(0) o3->78ab2c22-394d-bdd4-0b8e-3553d6a47e28@10.8.17.2@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1565598292 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:47 fir-md1-s1 kernel: Lustre: 27583:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 27 previous similar messages Aug 12 01:24:48 fir-md1-s1 kernel: LustreError: 97640:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2f3778b400 ns: mdt-fir-MDT0002_UUID lock: ffff8f2474d22f40/0x5d9ee6c5a6fbcae5 lrc: 3/0,0 mode: PW/PW res: [0x2c002c82b:0x19a35:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.27.35@o2ib6 remote: 0xf689582740c10a6a expref: 40750 pid: 97640 timeout: 0 lvb_type: 0 Aug 12 01:24:48 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:31s); client may timeout. req@ffff8f20d9ac9500 x1638890723410480/t0(0) o101->534e10c9-e8b6-b009-609a-c6de708bb45f@10.8.27.35@o2ib6:17/0 lens 512/536 e 0 to 0 dl 1565598257 ref 1 fl Complete:/0/0 rc -107/-107 Aug 12 01:24:48 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Aug 12 01:24:52 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2691a45000 Aug 12 01:24:52 fir-md1-s1 kernel: LustreError: 22894:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.1.26@o2ib6 arrived at 1565598292 with bad export cookie 6746082289100042004 Aug 12 01:24:53 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2addcfd600 Aug 12 01:24:53 fir-md1-s1 kernel: LustreError: 22649:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+0s req@ffff8f1d06735050 x1631775576331104/t0(0) o3->5e395341-d08e-b211-8691-de95d36d3421@10.8.13.21@o2ib6:23/0 lens 488/440 e 0 to 0 dl 1565598293 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 01:24:53 fir-md1-s1 kernel: LustreError: 22649:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 8 previous similar messages Aug 12 01:24:56 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f34dacd2800 Aug 12 01:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 11db871c-e0a1-11c9-b9f8-671134894ee1 (at 10.8.27.30@o2ib6), client will retry: rc = -110 Aug 12 01:24:56 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f17a3f50000 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1a77b60200 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3035f1e000 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29654f0e00 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 24580:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2f3778b400 ns: mdt-fir-MDT0002_UUID lock: ffff8f1c4a46f980/0x5d9ee6c5a6fc47e9 lrc: 3/0,0 mode: PW/PW res: [0x2c002c82b:0x19a36:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.8.27.35@o2ib6 remote: 0xf689582740c10a71 expref: 27253 pid: 24580 timeout: 0 lvb_type: 0 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f252ead3600 Aug 12 01:24:57 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f300c759c00 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3036f5c400 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f904e5400 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b4a0d6000 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2db22a6e00 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3372907400 Aug 12 01:24:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.13.12@o2ib6, removing former export from same NID Aug 12 01:24:58 fir-md1-s1 kernel: Lustre: Skipped 259 previous similar messages Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e6b9e1000 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f27ab2c1800 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f29cb412e00 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2f10d5ba00 Aug 12 01:24:58 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21ca85ae00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d228a2600 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 34s: evicting client at 10.8.2.30@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1d1d504ec0/0x5d9ee6c5a6f245b1 lrc: 4/0,0 mode: PR/PR res: [0x2000297c3:0x2863:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT flags: 0x60200400000020 nid: 10.8.2.30@o2ib6 remote: 0xe1ad9b13bedf41bb expref: 1082 pid: 23652 timeout: 4713353 lvb_type: 0 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20378:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f3280bfd100 x1636762100167216/t0(0) o104->fir-MDT0000@10.8.1.28@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f21ca85da00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1f10d1e000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f325bf8d200 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1eda35de00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e3000da00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1b75c63000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f339f2f9000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2594cb1400 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2594cb0200 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2594cb3000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f289bfdc200 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f289bfd9400 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e30009000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2a158f6800 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f3170ebc600 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1eda35ac00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f20e1b05e00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2782289400 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1d228a2e00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f32336ef200 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f268c7e2400 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2e3000ce00 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f294d7f6400 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2891ee9600 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f1eda358000 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20370:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.2@o2ib6 arrived at 1565598299 with bad export cookie 6746083025767805535 Aug 12 01:24:59 fir-md1-s1 kernel: LustreError: 20370:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 21 previous similar messages Aug 12 01:25:23 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f24a75a6300 x1636452605891984/t0(0) o101->3cd78fa7-4bd4-125a-ba70-c34fff2fc798@10.9.104.39@o2ib4:28/0 lens 584/3264 e 0 to 0 dl 1565598328 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 01:25:23 fir-md1-s1 kernel: Lustre: 24577:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Aug 12 01:25:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 54s: evicting client at 10.8.20.22@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f2015fcd7c0/0x5d9ee6c5a65892cd lrc: 3/0,0 mode: PR/PR res: [0x200029992:0x12d:0x0].0x0 bits 0x13/0x0 rrc: 47 type: IBT flags: 0x60200400000020 nid: 10.8.20.22@o2ib6 remote: 0x9514a6a619fc200d expref: 840 pid: 97647 timeout: 4713388 lvb_type: 0 Aug 12 01:25:28 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Aug 12 01:25:33 fir-md1-s1 kernel: Lustre: 97646:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:18s); client may timeout. req@ffff8f1d7598f200 x1631549154628128/t0(0) o101->ca15d879-1cb2-8780-e5e2-20230d9e27cf@10.8.28.3@o2ib6:15/0 lens 584/536 e 0 to 0 dl 1565598315 ref 1 fl Complete:/0/0 rc 0/0 Aug 12 01:25:33 fir-md1-s1 kernel: Lustre: 97646:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 45 previous similar messages Aug 12 01:25:53 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 126976 Aug 12 01:25:53 fir-md1-s1 kernel: LustreError: 21448:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 23 previous similar messages Aug 12 01:27:08 fir-md1-s1 kernel: LustreError: 42896:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 0 Aug 12 01:27:08 fir-md1-s1 kernel: LustreError: 42896:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 105 previous similar messages Aug 12 01:29:41 fir-md1-s1 kernel: LustreError: 46516:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 0 Aug 12 01:29:41 fir-md1-s1 kernel: LustreError: 46516:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 284 previous similar messages Aug 12 01:30:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 01:34:42 fir-md1-s1 kernel: LustreError: 46527:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 0 Aug 12 01:34:42 fir-md1-s1 kernel: LustreError: 46527:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1134 previous similar messages Aug 12 01:37:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 01:37:48 fir-md1-s1 kernel: Lustre: Skipped 2118 previous similar messages Aug 12 01:37:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 01:37:48 fir-md1-s1 kernel: Lustre: Skipped 3513 previous similar messages Aug 12 01:43:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 01:43:43 fir-md1-s1 kernel: Lustre: Skipped 482 previous similar messages Aug 12 01:44:46 fir-md1-s1 kernel: LustreError: 46527:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 0 Aug 12 01:44:46 fir-md1-s1 kernel: LustreError: 46527:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2461 previous similar messages Aug 12 01:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 01:48:34 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 01:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 01:48:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 01:54:53 fir-md1-s1 kernel: LustreError: 29831:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 135168 GRANT, real grant 0 Aug 12 01:54:53 fir-md1-s1 kernel: LustreError: 29831:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3128 previous similar messages Aug 12 01:59:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 01:59:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 01:59:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 01:59:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 02:04:54 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 139264 GRANT, real grant 0 Aug 12 02:04:54 fir-md1-s1 kernel: LustreError: 21449:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 3116 previous similar messages Aug 12 02:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 02:09:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 02:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 02:09:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 02:13:22 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601195/real 1565601195] req@ffff8f40e8325a00 x1636762122979568/t0(0) o104->fir-MDT0000@10.9.103.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601202 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 02:13:22 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 26 previous similar messages Aug 12 02:13:29 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601202/real 1565601202] req@ffff8f40e8325a00 x1636762122979568/t0(0) o104->fir-MDT0000@10.9.103.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601209 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 02:13:30 fir-md1-s1 kernel: Lustre: 10333:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f415e512d00 x1638092199508384/t0(0) o101->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:5/0 lens 1856/3288 e 1 to 0 dl 1565601215 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:13:30 fir-md1-s1 kernel: Lustre: 10333:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 34 previous similar messages Aug 12 02:13:40 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2859297200 x1636458749084208/t0(0) o101->d22c54d9-f3ee-e6f8-f34c-cd9ceccbd787@10.8.2.24@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1565601224 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:13:40 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 12 02:13:43 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601216/real 1565601216] req@ffff8f40e8325a00 x1636762122979568/t0(0) o104->fir-MDT0000@10.9.103.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601223 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 02:13:43 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 12 02:13:57 fir-md1-s1 kernel: Lustre: 23751:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f73c52100 x1631569086882960/t0(0) o101->d44090a1-80b0-7ccd-ebef-b445cb1b626b@10.9.104.53@o2ib4:2/0 lens 584/3264 e 1 to 0 dl 1565601242 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:13:57 fir-md1-s1 kernel: Lustre: 23751:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Aug 12 02:14:04 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601237/real 1565601237] req@ffff8f40e8325a00 x1636762122979568/t0(0) o104->fir-MDT0000@10.9.103.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601244 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 02:14:04 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 12 02:14:29 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1dcef2e600 x1631561391812928/t0(0) o101->514546c0-f541-1aff-a686-3b517b2c3225@10.9.105.10@o2ib4:4/0 lens 584/3264 e 0 to 0 dl 1565601274 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:14:29 fir-md1-s1 kernel: Lustre: 97660:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Aug 12 02:14:45 fir-md1-s1 kernel: LustreError: 23579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565601195, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f18f5b67740/0x5d9ee6c5b2a935b3 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 201 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23579 timeout: 0 lvb_type: 0 Aug 12 02:14:45 fir-md1-s1 kernel: LustreError: 23579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 12 02:14:46 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601279/real 1565601279] req@ffff8f40e8325a00 x1636762122979568/t0(0) o104->fir-MDT0000@10.9.103.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601286 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 02:14:46 fir-md1-s1 kernel: Lustre: 10362:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 12 02:14:55 fir-md1-s1 kernel: LustreError: 23738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565601205, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1848904380/0x5d9ee6c5b2b4e2e4 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 201 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23738 timeout: 0 lvb_type: 0 Aug 12 02:14:55 fir-md1-s1 kernel: LustreError: 23738:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Aug 12 02:15:14 fir-md1-s1 kernel: LustreError: 21416:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565601223, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0f46bbd580/0x5d9ee6c5b2c8fc69 lrc: 3/1,0 mode: --/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 201 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21416 timeout: 0 lvb_type: 0 Aug 12 02:15:14 fir-md1-s1 kernel: LustreError: 21416:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Aug 12 02:15:49 fir-md1-s1 kernel: LustreError: 10362:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.103.5@o2ib4) failed to reply to blocking AST (req@ffff8f40e8325a00 x1636762122979568 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f124428ec00/0x5d9ee6c5b2820689 lrc: 4/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 201 type: IBT flags: 0x60200400000020 nid: 10.9.103.5@o2ib4 remote: 0x4d6ee82faad0336 expref: 1181 pid: 27320 timeout: 4716551 lvb_type: 0 Aug 12 02:15:49 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.103.5@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 12 02:15:49 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.103.5@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f124428ec00/0x5d9ee6c5b2820689 lrc: 3/0,0 mode: PR/PR res: [0x200029791:0x7f4b:0x0].0x0 bits 0x13/0x0 rrc: 201 type: IBT flags: 0x60200400000020 nid: 10.9.103.5@o2ib4 remote: 0x4d6ee82faad0336 expref: 1182 pid: 27320 timeout: 0 lvb_type: 0 Aug 12 02:15:49 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Aug 12 02:16:01 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 143360 GRANT, real grant 0 Aug 12 02:16:01 fir-md1-s1 kernel: LustreError: 21293:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 2893 previous similar messages Aug 12 02:16:14 fir-md1-s1 kernel: Lustre: 20541:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601367/real 1565601367] req@ffff8f10d94a8600 x1636762124253504/t0(0) o104->fir-MDT0000@10.9.103.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601374 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 02:16:14 fir-md1-s1 kernel: Lustre: 20541:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 12 02:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1083b713-5c9e-c952-c295-c06d1528184a (at 10.9.103.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f226367bc00, cur 1565601390 expire 1565601240 last 1565601163 Aug 12 02:16:32 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0f661a4b00 x1638092199768752/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:7/0 lens 504/2888 e 0 to 0 dl 1565601397 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:16:32 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 73 previous similar messages Aug 12 02:16:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 617050f2-7fc0-79b5-eb13-239caef95ea4 (at 10.9.103.11@o2ib4) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f38a87a3400, cur 1565601402 expire 1565601252 last 1565601195 Aug 12 02:16:42 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 02:21:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 02:21:19 fir-md1-s1 kernel: Lustre: Skipped 391 previous similar messages Aug 12 02:21:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 02:21:19 fir-md1-s1 kernel: Lustre: Skipped 397 previous similar messages Aug 12 02:22:49 fir-md1-s1 kernel: Lustre: 23647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565601762/real 1565601762] req@ffff8f3978232a00 x1636762127159984/t0(0) o104->fir-MDT0000@10.9.103.16@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565601769 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 02:22:49 fir-md1-s1 kernel: Lustre: 23647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 12 02:23:07 fir-md1-s1 kernel: Lustre: 10363:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f38d5acce00 x1638092200686208/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:12/0 lens 504/2888 e 0 to 0 dl 1565601792 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:23:17 fir-md1-s1 kernel: LustreError: 23647:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.103.16@o2ib4) failed to reply to blocking AST (req@ffff8f3978232a00 x1636762127159984 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f23fe1106c0/0x5d9ee6c5b47cde5f lrc: 4/0,0 mode: PR/PR res: [0x200029830:0x16cee:0x0].0x0 bits 0x13/0x0 rrc: 26 type: IBT flags: 0x60200400000020 nid: 10.9.103.16@o2ib4 remote: 0x4ab221fc02ea1014 expref: 1170 pid: 23656 timeout: 4716879 lvb_type: 0 Aug 12 02:23:17 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.103.16@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 12 02:23:17 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.103.16@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f23fe1106c0/0x5d9ee6c5b47cde5f lrc: 3/0,0 mode: PR/PR res: [0x200029830:0x16cee:0x0].0x0 bits 0x13/0x0 rrc: 26 type: IBT flags: 0x60200400000020 nid: 10.9.103.16@o2ib4 remote: 0x4ab221fc02ea1014 expref: 1171 pid: 23656 timeout: 0 lvb_type: 0 Aug 12 02:23:17 fir-md1-s1 kernel: Lustre: 21380:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f34c521f500 x1637883313840624/t0(0) o101->c3b57da6-e271-d7f3-6542-32d74e000606@10.8.2.15@o2ib6:16/0 lens 584/536 e 0 to 0 dl 1565601796 ref 1 fl Complete:/0/0 rc 0/0 Aug 12 02:26:02 fir-md1-s1 kernel: LustreError: 46511:0:(tgt_grant.c:750:tgt_grant_check()) fir-MDT0002: cli 534e10c9-e8b6-b009-609a-c6de708bb45f claims 155648 GRANT, real grant 0 Aug 12 02:26:02 fir-md1-s1 kernel: LustreError: 46511:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1929 previous similar messages Aug 12 02:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 05bc5852-3091-f0d1-f1b3-97406dff981f (at 10.9.103.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2747ba4c00, cur 1565601965 expire 1565601815 last 1565601738 Aug 12 02:26:47 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 02:26:47 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 72 previous similar messages Aug 12 02:28:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 12 02:28:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 12 previous similar messages Aug 12 02:28:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.20.4@o2ib6, removing former export from same NID Aug 12 02:28:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.21@o2ib6, removing former export from same NID Aug 12 02:28:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Aug 12 02:28:53 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 02:28:56 fir-md1-s1 kernel: LustreError: 48198:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2d956a8850 x1631707700525984/t0(0) o3->acb1aa3b-60ab-7f7c-ec38-03838117cd24@10.8.25.12@o2ib6:21/0 lens 488/440 e 0 to 0 dl 1565602161 ref 1 fl Interpret:/0/0 rc 0/0 Aug 12 02:28:56 fir-md1-s1 kernel: LustreError: 48198:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 16 previous similar messages Aug 12 02:28:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.1.35@o2ib6, removing former export from same NID Aug 12 02:28:57 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 12 02:29:00 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f350ffe5800 Aug 12 02:29:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with acb1aa3b-60ab-7f7c-ec38-03838117cd24 (at 10.8.25.12@o2ib6), client will retry: rc -110 Aug 12 02:29:00 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 12 02:29:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.20.4@o2ib6, removing former export from same NID Aug 12 02:29:39 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 12 02:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 02:31:21 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 12 02:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 02:31:21 fir-md1-s1 kernel: Lustre: Skipped 181 previous similar messages Aug 12 02:35:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 02:35:11 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 02:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 02:41:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 02:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 02:41:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 02:45:59 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565603152/real 1565603152] req@ffff8f4072646900 x1636762137465280/t0(0) o104->fir-MDT0000@10.9.103.20@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565603159 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 02:45:59 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 12 02:46:07 fir-md1-s1 kernel: Lustre: 23636:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f343d10aa00 x1638092203642144/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:12/0 lens 504/2888 e 1 to 0 dl 1565603172 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:46:07 fir-md1-s1 kernel: Lustre: 23636:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Aug 12 02:46:41 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565603194/real 1565603194] req@ffff8f4072646900 x1636762137465280/t0(0) o104->fir-MDT0000@10.9.103.20@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565603201 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 02:46:41 fir-md1-s1 kernel: Lustre: 23582:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages Aug 12 02:46:44 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f18545def00 x1631441786359680/t0(0) o101->3373de9a-85b5-9e8c-3bf7-fc7b61c3cd4b@10.8.20.2@o2ib6:19/0 lens 576/3264 e 0 to 0 dl 1565603209 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:46:44 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Aug 12 02:47:22 fir-md1-s1 kernel: LustreError: 23649:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565603152, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f13cf177bc0/0x5d9ee6c5ba2d3db8 lrc: 3/1,0 mode: --/PR res: [0x200029f1b:0x5a12:0x0].0x0 bits 0x13/0x0 rrc: 827 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23649 timeout: 0 lvb_type: 0 Aug 12 02:47:22 fir-md1-s1 kernel: LustreError: 23649:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 87 previous similar messages Aug 12 02:47:27 fir-md1-s1 kernel: LustreError: 21680:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565603157, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3983261200/0x5d9ee6c5ba31b1df lrc: 3/1,0 mode: --/PR res: [0x200029f1b:0x5a12:0x0].0x0 bits 0x13/0x0 rrc: 829 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21680 timeout: 0 lvb_type: 0 Aug 12 02:47:27 fir-md1-s1 kernel: LustreError: 21680:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Aug 12 02:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f5de3965-7389-a296-8c42-1779e3e91d02 (at 10.9.103.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f054926f000, cur 1565603250 expire 1565603100 last 1565603023 Aug 12 02:47:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 02:47:30 fir-md1-s1 kernel: Lustre: 10150:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8f3e18d6a700 x1636475078259696/t0(0) o101->d47774f8-1891-ffbb-99df-b3a519064756@10.9.109.1@o2ib4:29/0 lens 576/536 e 0 to 0 dl 1565603249 ref 1 fl Complete:/0/0 rc 0/0 Aug 12 02:47:30 fir-md1-s1 kernel: Lustre: 10150:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Aug 12 02:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 02:51:52 fir-md1-s1 kernel: Lustre: Skipped 138 previous similar messages Aug 12 02:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 02:51:52 fir-md1-s1 kernel: Lustre: Skipped 141 previous similar messages Aug 12 02:55:07 fir-md1-s1 kernel: Lustre: 24581:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565603700/real 1565603700] req@ffff8f098530bf00 x1636762141465952/t0(0) o104->fir-MDT0000@10.8.27.19@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565603707 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 02:55:07 fir-md1-s1 kernel: Lustre: 24581:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Aug 12 02:55:25 fir-md1-s1 kernel: Lustre: 21458:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d22948c00 x1637896110397952/t0(0) o101->0fbd0bc5-1b97-ce6f-ee27-bc5fe7bffe9b@10.8.27.5@o2ib6:0/0 lens 1792/3288 e 0 to 0 dl 1565603730 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 02:55:25 fir-md1-s1 kernel: Lustre: 21458:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 205 previous similar messages Aug 12 02:56:45 fir-md1-s1 kernel: LustreError: 24581:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.19@o2ib6) returned error from blocking AST (req@ffff8f098530bf00 x1636762141465952 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f349d938900/0x5d9ee6c5b519c523 lrc: 4/0,0 mode: PR/PR res: [0x20002a018:0x4c3b:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.19@o2ib6 remote: 0xbb3407ee5584a862 expref: 899 pid: 23665 timeout: 4719014 lvb_type: 0 Aug 12 02:56:45 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.19@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Aug 12 02:56:45 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.8.27.19@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f349d938900/0x5d9ee6c5b519c523 lrc: 3/0,0 mode: PR/PR res: [0x20002a018:0x4c3b:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.19@o2ib6 remote: 0xbb3407ee5584a862 expref: 900 pid: 23665 timeout: 0 lvb_type: 0 Aug 12 02:57:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 02:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 99d2914e-475f-4971-5251-9a253925e7fe (at 10.8.27.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25012f3000, cur 1565603852 expire 1565603702 last 1565603625 Aug 12 02:57:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 02:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 02:58:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 03:03:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 03:03:30 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 03:03:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 03:03:30 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 03:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 03:13:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 03:13:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 03:13:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 03:14:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 03:14:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:18:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 03:23:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 03:23:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 03:23:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 03:23:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 03:24:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3b5cf300-671c-34bc-248c-e2409c8f6292 (at 10.8.20.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250eab2800, cur 1565605485 expire 1565605335 last 1565605258 Aug 12 03:24:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 03:25:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:32:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:33:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:34:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 03:34:14 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 03:34:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 03:34:14 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 03:36:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 03:44:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 03:44:33 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 03:44:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 03:44:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 03:45:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:50:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 03:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 03:54:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 03:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 03:54:50 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 04:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 04:05:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 04:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 04:05:24 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 04:10:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:10:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 04:11:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 04:15:39 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 04:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 04:15:39 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 04:26:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 04:26:29 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 04:26:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 04:26:29 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 04:28:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:29:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 04:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 04:37:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 04:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 04:37:52 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 04:40:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:40:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 04:44:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:46:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:48:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 04:48:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 04:48:26 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 04:48:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 04:48:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 04:49:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:58:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 04:58:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 04:58:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 04:58:36 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 04:59:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 04:59:33 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 05:08:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 05:08:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 05:08:37 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 05:10:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 05:10:52 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 05:19:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 05:19:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 05:20:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 05:20:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 05:21:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 05:21:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 05:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 05:31:23 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 05:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 05:31:23 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 05:37:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 05:37:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 05:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 05:41:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 05:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 05:41:29 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 05:47:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 05:47:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 05:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 05:51:51 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 05:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 05:51:51 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 06:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 06:01:52 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 06:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 06:01:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 06:03:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 06:03:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 06:12:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ed462a8-ed6a-0891-ced6-ebadfda1f88d (at 10.8.8.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fcc29e000, cur 1565615553 expire 1565615403 last 1565615326 Aug 12 06:12:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 06:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 06:13:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 06:13:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 06:13:27 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 06:15:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 06:15:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 06:20:50 fir-md1-s1 kernel: Lustre: 97648:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f21cb76dd00 x1631770502933536/t0(0) o101->bf3478cc-569b-5c14-1a71-20ca1e1f08aa@10.8.12.12@o2ib6:25/0 lens 376/1600 e 1 to 0 dl 1565616055 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 06:20:50 fir-md1-s1 kernel: Lustre: 97648:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 12 06:21:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.12.12@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2a6a236300/0x5d9ee6c5e94e0f5e lrc: 3/0,0 mode: CR/CR res: [0x2c002c894:0x34:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.12.12@o2ib6 remote: 0xbf262a2196449619 expref: 51 pid: 23749 timeout: 4731124 lvb_type: 0 Aug 12 06:21:04 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2cc4357400 ns: mdt-fir-MDT0002_UUID lock: ffff8f23dc630480/0x5d9ee6c5e94e1412 lrc: 1/0,0 mode: EX/EX res: [0x2c002c894:0x34:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x54801000000000 nid: 10.8.12.12@o2ib6 remote: 0xbf262a2196449635 expref: 16 pid: 22283 timeout: 0 lvb_type: 3 Aug 12 06:21:04 fir-md1-s1 kernel: LustreError: 22283:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Aug 12 06:21:05 fir-md1-s1 kernel: Lustre: 22283:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:9s); client may timeout. req@ffff8f21cb76dd00 x1631770502933536/t358006919326(0) o101->bf3478cc-569b-5c14-1a71-20ca1e1f08aa@10.8.12.12@o2ib6:25/0 lens 376/1568 e 1 to 0 dl 1565616055 ref 1 fl Complete:/0/0 rc -107/-107 Aug 12 06:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 06:23:30 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 06:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 06:23:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 06:28:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 06:28:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 06:34:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 06:34:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 06:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 06:34:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 06:38:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 06:38:28 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 06:41:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 06:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 06:45:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 06:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 06:45:07 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 06:52:55 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565617968/real 1565617968] req@ffff8f0892dfc800 x1636762250981744/t0(0) o104->fir-MDT0000@10.8.8.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565617975 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 12 06:52:55 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Aug 12 06:53:03 fir-md1-s1 kernel: Lustre: 23567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f077d208300 x1638092247049040/t0(0) o36->bb8c2ece-f417-e8df-48b9-34282257b797@10.9.104.28@o2ib4:8/0 lens 504/2888 e 1 to 0 dl 1565617988 ref 2 fl Interpret:/0/0 rc 0/0 Aug 12 06:53:16 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565617989/real 1565617989] req@ffff8f0892dfc800 x1636762250981744/t0(0) o104->fir-MDT0000@10.8.8.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565617996 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 12 06:53:16 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 12 06:53:23 fir-md1-s1 kernel: LustreError: 23687:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.8.33@o2ib6) failed to reply to blocking AST (req@ffff8f0892dfc800 x1636762250981744 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f34ea4da880/0x5d9ee6c5ef646b05 lrc: 4/0,0 mode: PR/PR res: [0x2000298d1:0x12bf4:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.33@o2ib6 remote: 0x96e5fdb266fc791b expref: 227 pid: 97648 timeout: 4733085 lvb_type: 0 Aug 12 06:53:23 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.8.33@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 12 06:53:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.8.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f34ea4da880/0x5d9ee6c5ef646b05 lrc: 3/0,0 mode: PR/PR res: [0x2000298d1:0x12bf4:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.33@o2ib6 remote: 0x96e5fdb266fc791b expref: 228 pid: 97648 timeout: 0 lvb_type: 0 Aug 12 06:53:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 06:53:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 06:55:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 06:55:13 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 06:55:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 06:55:13 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 06:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 36c50ebf-42f1-2e51-f789-02d6d7eec692 (at 10.8.8.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3177f25c00, cur 1565618166 expire 1565618016 last 1565617939 Aug 12 06:56:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 07:04:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 07:04:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 07:05:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 07:05:19 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 07:05:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:06:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 07:06:10 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 07:06:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:15:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 07:15:09 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 07:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 07:15:44 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 07:17:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:18:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:18:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 07:18:46 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 07:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 07:25:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 07:28:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:29:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 07:29:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 07:29:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 07:29:42 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 07:29:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:36:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 07:36:57 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 07:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 07:39:45 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 07:44:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 07:45:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:47:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 07:47:10 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 07:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 07:49:46 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 07:57:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 07:57:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 07:57:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 07:57:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 07:57:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 07:59:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 07:59:51 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 08:02:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2230cc7-c2d6-dbeb-cb28-799150a5ed60 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d3aba5400, cur 1565622179 expire 1565622029 last 1565621952 Aug 12 08:02:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 08:03:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 9854f928-923e-019b-de7e-29bff7faefbd (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0ab7902c00, cur 1565622185 expire 1565622035 last 1565621958 Aug 12 08:03:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 08:08:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 08:08:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 08:12:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 08:12:13 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 08:12:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 657250be-d5db-acec-954e-1239d7463eca (at 10.9.104.65@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4536585800, cur 1565622736 expire 1565622586 last 1565622509 Aug 12 08:14:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 08:14:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 08:18:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 08:18:22 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 08:22:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 08:22:38 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 08:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 08:28:51 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 08:32:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 08:32:58 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 08:35:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 08:35:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 08:37:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 08:39:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 08:39:14 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 08:39:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 08:43:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 08:43:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 08:47:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 08:47:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 08:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fb1fc55b-f12b-0e1c-df71-cad68af1a7c8 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0bef27b000, cur 1565624871 expire 1565624721 last 1565624644 Aug 12 08:47:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 08:49:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 08:49:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 08:54:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 08:54:38 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 08:57:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 08:57:47 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 08:59:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 08:59:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 08:59:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 09:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 09:05:00 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 09:06:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 09:08:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 09:08:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 09:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 09:09:55 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 09:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 09:15:06 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 09:21:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 09:21:08 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 09:25:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 09:25:36 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 09:27:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 09:27:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 09:30:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 09:31:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 09:31:39 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 12 09:35:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 09:35:59 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 09:40:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 09:40:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 09:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 09:41:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 09:46:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 09:46:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 09:50:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 09:50:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 09:51:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 09:51:59 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 09:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 09:56:06 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 10:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 10:02:12 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 10:02:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 10:02:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 10:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:06:42 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 10:08:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 017500a3-1d3d-914e-4000-3d279791ec17 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e479ea400, cur 1565629684 expire 1565629534 last 1565629457 Aug 12 10:08:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 10:08:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f6862648-8adb-d9db-aeab-664122b0cbef (at 10.8.26.4@o2ib6) in 228 seconds. I think it's dead, and I am evicting it. exp ffff8f3f908f6800, cur 1565629685 expire 1565629535 last 1565629457 Aug 12 10:08:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 10:12:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 10:12:49 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 10:17:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:17:36 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 10:18:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 10:18:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 10:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 10:23:03 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 10:28:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:28:03 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 10:30:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 10:30:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 10:33:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 10:33:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 10:38:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:38:11 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 10:40:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 10:40:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8245dc57-48d7-be4c-07f0-01d11921eb89 (at 10.8.2.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ed701000, cur 1565631634 expire 1565631484 last 1565631407 Aug 12 10:43:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 10:43:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 10:43:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 10:43:14 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 10:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4d027e3d-7c28-d980-baee-e42398a04d93 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21b494f000, cur 1565631809 expire 1565631659 last 1565631582 Aug 12 10:43:29 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 10:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:48:15 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 10:52:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 10:53:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 10:53:18 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 10:53:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 10:53:18 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 10:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 10:58:47 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 11:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 11:03:22 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 11:03:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 11:03:50 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 11:09:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 11:09:22 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 11:13:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 11:13:30 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 12 11:20:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 11:20:04 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 11:23:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 11:23:57 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 12 11:25:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 11:25:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 11:29:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 11:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 11:30:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 11:34:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 11:34:04 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Aug 12 11:34:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 11:40:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 11:40:53 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 11:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 11:44:29 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 11:44:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 11:44:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 11:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 11:51:48 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 11:55:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 11:55:01 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 11:59:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 11:59:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 12:01:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 12:01:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 12:03:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 12:04:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 12:04:24 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 12:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 12:05:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 12:09:08 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 12 12:09:08 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 12 12:11:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 12:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 12:14:46 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 12:16:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 12:16:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 12:16:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 12:21:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 12:21:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 12:26:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 12:26:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 12:26:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 12:26:50 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 12:31:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 12:31:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 12:35:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 12:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 12:37:30 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 12:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 12:37:30 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 12:41:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 12:41:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 12:48:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 12:48:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 12:48:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 12:48:00 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 12:51:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 12:51:15 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 12:58:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 12:58:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 12:58:32 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 12 12:58:32 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 13:01:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 13:01:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 13:03:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 13:09:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 13:09:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 13:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 13:10:00 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 13:12:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 13:12:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 13:20:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 13:20:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 13:20:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 13:20:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 13:22:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 13:22:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 13:31:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 13:31:10 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 13:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 13:31:37 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 13:33:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 13:33:01 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 13:42:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 13:42:58 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 13:42:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 13:42:58 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 13:44:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 13:44:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 13:46:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 13:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 13:53:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 13:53:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 13:53:06 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 13:57:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 13:57:13 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 14:03:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 14:03:09 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 14:04:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 769d013d-f990-3399-dde8-f67f737a957d (at 10.8.7.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25019d6400, cur 1565643876 expire 1565643726 last 1565643649 Aug 12 14:04:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 14:04:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 14:04:57 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 14:07:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 14:07:46 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 14:13:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 14:13:44 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 14:15:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 14:15:54 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 14:18:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 14:20:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f18f5844-4ec0-3cde-21e1-0f1a02440d5a (at 10.8.17.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33bbf08000, cur 1565644856 expire 1565644706 last 1565644629 Aug 12 14:20:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 14:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b69f9e11-4b5c-ba5f-9212-2257d036bcd0 (at 10.8.25.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f340c72f000, cur 1565644873 expire 1565644723 last 1565644646 Aug 12 14:21:13 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 14:23:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 14:23:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 14:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 14:26:24 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 14:30:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 14:30:39 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 14:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c916b4c-f077-8202-b2a1-76eae483981d (at 10.8.24.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f448d296000, cur 1565645583 expire 1565645433 last 1565645356 Aug 12 14:33:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 14:34:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 14:34:18 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 12 14:37:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 14:37:44 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 14:39:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 14:40:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 14:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 106c83df-ddc0-8ef3-f3b2-99ac95921e36 (at 10.8.20.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee4988c00, cur 1565646036 expire 1565645886 last 1565645809 Aug 12 14:40:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 14:40:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 106c83df-ddc0-8ef3-f3b2-99ac95921e36 (at 10.8.20.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee498e400, cur 1565646040 expire 1565645890 last 1565645813 Aug 12 14:40:40 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 14:41:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 14:41:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 14:41:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 14:44:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 14:44:55 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 12 14:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a2e3e0ff-df1f-dfdf-2a96-6667b7e7cc4d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19ddc69000, cur 1565646332 expire 1565646182 last 1565646105 Aug 12 14:45:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 14:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 14:48:27 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 14:55:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 14:55:14 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 14:56:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 14:56:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 14:58:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 14:58:49 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 15:05:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 15:05:21 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 15:07:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 15:07:38 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 15:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5ee9cd52-fe47-bac0-6452-5da37b97f448 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ba6384c00, cur 1565647776 expire 1565647626 last 1565647549 Aug 12 15:09:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:11:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 15:11:47 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 15:15:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 15:15:53 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 15:22:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 15:22:13 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 15:23:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 15:23:40 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 15:24:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 15:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 15:26:24 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 15:29:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 37f6a13f-2e02-d0b9-fa38-3c617460d923 (at 10.9.109.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f433736b800, cur 1565648998 expire 1565648848 last 1565648771 Aug 12 15:29:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 71fe14ec-02bc-cf23-c327-5f99a5777b41 (at 10.8.26.4@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8f1c59185400, cur 1565649074 expire 1565648924 last 1565648871 Aug 12 15:31:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:32:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 15:32:22 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 15:34:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 15:34:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 15:37:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 15:37:52 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 12 15:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c7b339fe-b7e8-1d83-400d-89ca749d7e2c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c59d7c800, cur 1565649565 expire 1565649415 last 1565649338 Aug 12 15:39:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:42:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 15:42:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 15:44:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 15:44:47 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 15:47:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3760c967-7374-a404-32ca-b47c886108c0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18af627000, cur 1565650031 expire 1565649881 last 1565649804 Aug 12 15:47:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:47:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 12 15:47:52 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 15:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d1277529-cbf1-b0b5-ff2d-5b114cf66536 (at 10.9.112.14@o2ib4) in 220 seconds. I think it's dead, and I am evicting it. exp ffff8f2d68a89000, cur 1565650107 expire 1565649957 last 1565649887 Aug 12 15:48:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:54:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 15:54:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 15:54:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 15:54:54 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 15:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 15:58:06 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 16:05:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 16:05:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 16:08:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 16:08:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 16:08:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 16:08:16 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 16:09:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 16:11:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 9809d419-7933-ff81-212e-bb34805054ba (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f397a0b7000, cur 1565651505 expire 1565651355 last 1565651278 Aug 12 16:11:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 16:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 16:15:44 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 16:18:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 16:18:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 16:20:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 16:20:40 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 16:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 16:27:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 16:28:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 16:28:24 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 16:30:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 16:32:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 12 16:32:05 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 16:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 16:37:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 16:37:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 16:38:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 16:38:24 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 16:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a07faacc-5423-a8e4-0c59-8c452f1dc0e0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e619fcc00, cur 1565653276 expire 1565653126 last 1565653049 Aug 12 16:41:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 16:42:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 16:42:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 16:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 16:48:24 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 16:49:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 16:49:46 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 12 16:57:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 16:58:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 16:58:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 16:58:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 16:58:49 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 16:59:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 16:59:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 16:59:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 17:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ded27e7-7e33-ddc2-2fe5-582327834280 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f23a68ee000, cur 1565654741 expire 1565654591 last 1565654514 Aug 12 17:05:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 17:08:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 17:08:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 17:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 17:09:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 17:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 17:09:50 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 12 17:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 17:19:56 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 17:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 17:19:56 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 17:20:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 17:20:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 17:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 127309ae-ae53-bc45-9700-eccbdd022109 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b7cefe800, cur 1565656098 expire 1565655948 last 1565655871 Aug 12 17:28:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 17:30:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 17:30:21 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 17:30:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 17:30:21 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 12 17:36:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 17:36:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 17:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 17:40:44 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 17:40:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 17:40:44 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 17:49:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 17:49:47 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 17:51:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 17:51:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 17:51:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 17:51:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 17:51:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 21a70d2c-a026-0656-ff84-e27f3afcebd6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f26c89b8800, cur 1565657518 expire 1565657368 last 1565657291 Aug 12 17:51:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 18:01:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:01:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 18:01:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:01:09 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 18:01:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 18:01:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 18:01:30 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 18:11:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:11:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 18:11:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:11:19 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 18:11:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 18:11:46 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 18:21:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:21:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 18:21:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:21:26 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 18:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 18:21:54 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 18:27:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 18:30:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 010a69a0-5d4d-4b48-ab28-f753c0ce7a94 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f159bcaec00, cur 1565659805 expire 1565659655 last 1565659578 Aug 12 18:30:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 18:31:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:31:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 18:31:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:31:31 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 12 18:33:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 18:33:17 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 18:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:42:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 18:42:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:42:28 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 18:43:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 18:43:23 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 18:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 18:52:15 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 18:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e7a4821b-b08b-edb3-8ec7-fb9a70b2643d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a681c4000, cur 1565661213 expire 1565661063 last 1565660986 Aug 12 18:53:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 18:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 18:53:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 18:54:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 18:54:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 18:59:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 03bd30cf-ac73-2af0-b279-7658953c2d6d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18c4157c00, cur 1565661565 expire 1565661415 last 1565661338 Aug 12 18:59:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 19:02:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 19:02:19 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 19:04:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 19:04:09 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 19:04:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 19:04:37 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 19:06:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 19:12:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 19:12:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 19:12:56 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 19:14:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 19:14:17 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 19:14:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 19:14:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 19:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 19:23:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 19:24:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 19:24:58 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 19:26:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 19:26:41 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 19:33:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 19:33:34 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 19:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 19:35:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 19:40:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 19:40:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 19:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 19:44:00 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 12 19:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 19:45:23 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 19:51:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 19:51:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 19:54:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 19:54:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 19:55:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 19:55:29 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 20:02:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 20:03:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 20:03:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 20:04:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:04:38 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 20:06:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 20:06:39 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 20:14:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 20:14:17 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 20:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:15:11 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 20:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 20:17:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 12 20:20:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 20:25:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 20:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:25:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 20:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 20:28:18 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 20:28:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 20:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:37:13 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 20:39:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 20:39:03 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 12 20:42:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 20:42:15 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 20:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:47:19 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 20:49:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 20:49:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 20:50:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 35f3aafa-cc64-522c-e36c-5d56f55728c0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f689b9400, cur 1565668227 expire 1565668077 last 1565668000 Aug 12 20:50:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 20:56:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 20:56:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 20:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 20:58:47 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 21:00:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 21:00:46 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 21:08:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 21:08:24 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 21:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 21:09:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 21:11:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 21:11:09 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 12 21:19:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4de8c35-7b3c-e4e3-6279-98876b3fd4f5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27912ad800, cur 1565669983 expire 1565669833 last 1565669756 Aug 12 21:19:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 21:20:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 21:20:16 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 12 21:21:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 21:21:38 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 21:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 21:22:33 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 21:30:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 21:30:51 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 12 21:32:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 21:32:41 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 21:36:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 21:36:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 21:36:17 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 21:41:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 21:41:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 21:42:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 21:42:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 21:45:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 21:51:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 21:51:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 12 21:52:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 21:52:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 12 21:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 21:54:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 21:59:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 22:01:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 22:01:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 22:02:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:02:44 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 12 22:03:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 22:04:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 22:04:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 12 22:05:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 22:12:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:12:50 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 22:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 22:14:21 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 22:23:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:23:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 22:24:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 22:24:45 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 22:29:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 22:29:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 22:31:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 22:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:33:27 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 22:34:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 22:34:49 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 22:43:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 22:43:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:43:59 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 12 22:45:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 22:45:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 22:53:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 22:53:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 22:55:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 22:55:25 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 12 22:55:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 22:55:51 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 12 23:03:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 12 23:05:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 23:05:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:05:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 23:05:30 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 23:05:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 23:05:56 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 12 23:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f3635142-f27f-1f62-11cd-66dce195842b (at 10.8.21.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3113a11400, cur 1565676692 expire 1565676542 last 1565676465 Aug 12 23:11:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:14:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client eec39e2e-2777-5680-7292-5ea6df5b8859 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16f9170400, cur 1565676845 expire 1565676695 last 1565676618 Aug 12 23:14:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8e01fede-2264-f912-b4d7-0bdd9a200466 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bd0e8c400, cur 1565676861 expire 1565676711 last 1565676634 Aug 12 23:15:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 23:15:37 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 12 23:16:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 23:16:05 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 23:16:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 23:16:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 12 23:25:38 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 12 23:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 12 23:26:06 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 12 23:27:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 23:27:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 53dd556f-de5a-200e-f71f-b23d4f71ac0a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e6f8f2c00, cur 1565677766 expire 1565677616 last 1565677539 Aug 12 23:29:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 23:35:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 12 23:35:55 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 23:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e1190192-c180-5664-6c9a-5a7192939e63 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25210b0000, cur 1565678182 expire 1565678032 last 1565677955 Aug 12 23:36:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:36:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e1190192-c180-5664-6c9a-5a7192939e63 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25359d9800, cur 1565678199 expire 1565678049 last 1565677972 Aug 12 23:38:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 23:38:21 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 23:39:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 23:39:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 12 23:42:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 43ca5ef6-c9ce-61e6-b973-2f605c2d7de5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e49b04c00, cur 1565678569 expire 1565678419 last 1565678342 Aug 12 23:42:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 12 23:46:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 23:46:30 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 12 23:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 12 23:49:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 12 23:50:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 12 23:50:17 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 12 23:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0c776e76-4bfd-8dea-bc06-511ff83b74c9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4539820000, cur 1565679418 expire 1565679268 last 1565679191 Aug 12 23:56:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 12 23:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 12 23:57:19 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 00:00:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 00:00:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 00:02:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 00:02:15 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 00:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 00:07:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 00:11:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 00:11:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 00:12:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 00:12:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 00:19:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 00:19:01 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 00:22:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 00:22:45 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 00:22:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 00:22:46 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 00:26:15 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565681168/real 1565681168] req@ffff8f1c50170600 x1636763153073648/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565681175 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 13 00:26:15 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 13 00:26:22 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565681175/real 1565681175] req@ffff8f1c50170600 x1636763153073648/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565681182 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 00:26:31 fir-md1-s1 kernel: Lustre: 10364:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565681184/real 1565681184] req@ffff8f394d7bd400 x1636763153438864/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565681191 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 00:26:31 fir-md1-s1 kernel: Lustre: 10364:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 13 00:26:32 fir-md1-s1 kernel: Lustre: 23711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f393324a700 x1635112927838752/t0(0) o36->a7b52302-8357-4c6d-7fa0-dadc881daf77@10.9.109.7@o2ib4:7/0 lens 504/448 e 1 to 0 dl 1565681197 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 00:26:40 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e62f7ad00 x1631810367597888/t0(0) o101->13061d85-51ac-4b0f-0a27-af4e7a3825e8@10.8.22.3@o2ib6:15/0 lens 584/3264 e 1 to 0 dl 1565681205 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 00:26:40 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 13 00:26:43 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff8f1c50170600 x1636763153073648 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2a438d7740/0x5d9ee6c745c794a2 lrc: 4/0,0 mode: PR/PR res: [0x2c002c758:0x1809:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x7d5756caf39a8d4c expref: 1377 pid: 97660 timeout: 4796285 lvb_type: 0 Aug 13 00:26:43 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages Aug 13 00:26:43 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 13 00:26:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 13 00:26:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2a438d7740/0x5d9ee6c745c794a2 lrc: 3/0,0 mode: PR/PR res: [0x2c002c758:0x1809:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x7d5756caf39a8d4c expref: 1378 pid: 97660 timeout: 0 lvb_type: 0 Aug 13 00:27:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e574ad74-13db-37ea-473e-7297c488240f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e47927400, cur 1565681259 expire 1565681109 last 1565681032 Aug 13 00:27:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 00:29:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 00:29:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 00:29:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 00:29:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 00:33:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 00:33:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 00:35:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 00:35:17 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 00:36:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 00:41:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 00:41:37 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 00:43:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 00:43:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 00:44:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 00:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 00:45:37 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 00:46:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e6dc2b6d-8ab3-1ddc-9fd4-e407d5e6c6f3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0640d7c800, cur 1565682401 expire 1565682251 last 1565682174 Aug 13 00:46:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 00:53:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 00:53:26 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 00:56:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 00:56:41 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 01:00:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 01:00:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 01:02:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 01:03:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 01:03:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 01:03:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 01:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 01:06:45 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 01:10:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 01:13:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 01:13:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 01:14:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 01:14:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 01:16:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 01:16:51 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 01:23:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 95c7f4ed-9113-4108-be82-3d577d82c42b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b22683400, cur 1565684613 expire 1565684463 last 1565684386 Aug 13 01:23:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 01:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 13 01:23:43 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 01:25:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 01:25:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 01:30:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 01:30:07 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 01:33:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 01:33:46 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 01:36:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 01:36:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 01:41:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 01:41:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 01:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 01:44:25 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 01:51:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 01:51:11 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 01:51:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 01:51:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 01:52:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 18ebcc9d-2595-66c5-1c5d-3a54ce0d9336 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ec6464c00, cur 1565686331 expire 1565686181 last 1565686104 Aug 13 01:52:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 01:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 01:54:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 02:03:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 02:03:23 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 02:04:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 02:04:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 02:05:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 02:05:25 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 02:09:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 02:13:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 02:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 02:14:10 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 02:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 02:16:02 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 02:17:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 02:17:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 02:24:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 02:24:13 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 02:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0e7e4060-3ea4-76c8-3e90-550d07cc8be5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f349c2e0c00, cur 1565688320 expire 1565688170 last 1565688093 Aug 13 02:25:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 02:26:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 02:26:09 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 02:27:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 02:27:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 02:32:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 263a5ecf-91a8-bfb2-8012-8fce9c324296 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bea99a400, cur 1565688757 expire 1565688607 last 1565688530 Aug 13 02:32:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 02:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 02:34:45 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 02:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 02:36:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 02:38:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 02:38:34 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 02:45:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 02:45:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 02:46:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 02:46:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 02:49:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 02:49:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 02:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 02:57:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 02:57:19 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 02:57:19 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 03:01:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 03:01:00 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 03:05:43 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690735/real 1565690735] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690742 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 13 03:05:43 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 13 03:05:50 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f170e346000 x1638695690531840/t0(0) o101->d54a8f77-2b52-6f64-1e88-3432d0c3115c@10.8.31.6@o2ib6:25/0 lens 1808/3288 e 1 to 0 dl 1565690755 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 03:05:50 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690743/real 1565690743] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690750 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 03:05:57 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690750/real 1565690750] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690757 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 03:06:11 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690764/real 1565690764] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690771 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 03:06:11 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 13 03:06:16 fir-md1-s1 kernel: Lustre: 23755:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27898d3600 x1638547219497328/t0(0) o101->1890d675-ce1f-cd8f-dea3-5b5821d43c68@10.8.0.67@o2ib6:21/0 lens 584/3264 e 1 to 0 dl 1565690781 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 03:06:32 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690785/real 1565690785] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690792 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 03:06:32 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 13 03:07:14 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565690827/real 1565690827] req@ffff8f1fe27eef00 x1636763198663952/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565690834 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 03:07:14 fir-md1-s1 kernel: Lustre: 97660:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 13 03:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d54a8f77-2b52-6f64-1e88-3432d0c3115c (at 10.8.31.6@o2ib6) reconnecting Aug 13 03:07:21 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 03:07:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3014d21f-799a-b980-ecfd-c452b16bc320 (at 10.8.31.6@o2ib6) Aug 13 03:07:21 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 03:07:32 fir-md1-s1 kernel: LustreError: 23584:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565690761, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2f0b3a6780/0x5d9ee6c78a6158ad lrc: 3/1,0 mode: --/PR res: [0x2c002c39f:0x1911:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23584 timeout: 0 lvb_type: 0 Aug 13 03:07:32 fir-md1-s1 kernel: LustreError: 23584:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 13 03:07:39 fir-md1-s1 kernel: Lustre: 23738:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f317cd5fb00 x1641513412457664/t0(0) o101->991d2e99-e6d7-ae16-8b90-bed371650b41@10.8.25.7@o2ib6:14/0 lens 584/3264 e 0 to 0 dl 1565690864 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 03:08:10 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff8f1fe27eef00 x1636763198663952 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f23e8f172c0/0x5d9ee6c78a01516a lrc: 4/0,0 mode: PR/PR res: [0x2c002c39f:0x1911:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x747118afba6f3d98 expref: 113 pid: 20461 timeout: 4806092 lvb_type: 0 Aug 13 03:08:10 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 13 03:08:10 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f23e8f172c0/0x5d9ee6c78a01516a lrc: 3/0,0 mode: PR/PR res: [0x2c002c39f:0x1911:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x747118afba6f3d98 expref: 114 pid: 20461 timeout: 0 lvb_type: 0 Aug 13 03:08:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 05849a55-330f-1eee-260a-849071e83103 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f287135fc00, cur 1565690919 expire 1565690769 last 1565690692 Aug 13 03:08:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 03:08:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8ab6a35c-bbe6-8160-ace2-48c2129890d1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e4c27b000, cur 1565690930 expire 1565690780 last 1565690703 Aug 13 03:14:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 03:14:19 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 03:15:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 03:17:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 03:17:32 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 03:17:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 03:17:32 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 13 03:20:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 03:25:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 03:25:42 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 03:28:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 03:28:01 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 03:28:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 03:28:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 03:38:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 03:38:22 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 03:38:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 03:38:22 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 03:45:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 03:45:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 03:46:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 03:48:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 03:48:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 03:48:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 03:48:35 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 03:53:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 03:53:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 03:55:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 03:55:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 03:59:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 03:59:13 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 03:59:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 03:59:13 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 04:06:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d692934-5b5a-5f01-e988-3741c142d68f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec9a6c000, cur 1565694392 expire 1565694242 last 1565694165 Aug 13 04:09:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 04:09:19 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 04:09:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 04:09:19 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 04:14:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 04:14:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 04:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 04:19:22 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 04:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 04:19:22 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 04:20:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 00405586-17d4-8adc-59e8-ba7228a75f7e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f275e8af000, cur 1565695248 expire 1565695098 last 1565695021 Aug 13 04:20:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 04:21:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f57c33e3-d4c6-967e-11e5-6eff2e5bf8de (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f275e8ab000, cur 1565695262 expire 1565695112 last 1565695035 Aug 13 04:21:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 04:29:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26ce1cbc-b283-58f5-e636-b3693f98bc76 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a5bac4400, cur 1565695749 expire 1565695599 last 1565695522 Aug 13 04:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 04:29:26 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 04:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 04:29:26 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 04:30:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 04:30:55 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 04:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 04:39:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 04:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 04:39:29 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 04:41:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 04:41:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 04:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 04:49:35 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 04:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 04:49:35 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 04:53:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 04:53:06 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 04:58:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 05:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 05:00:23 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 05:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 05:00:23 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 05:03:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 05:03:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 05:10:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 05:10:46 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 05:10:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 05:10:46 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 05:14:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 05:14:21 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 05:20:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 05:20:48 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 05:22:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 05:22:12 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 05:24:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 05:24:53 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 05:25:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 05:30:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 05:30:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 05:33:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 05:33:54 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 05:35:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 05:35:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 05:36:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 05:41:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 05:41:24 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 05:45:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 05:45:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 05:45:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 05:45:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 05:49:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 05:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 05:51:44 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 05:56:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 05:56:14 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 05:56:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 05:56:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 06:02:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 06:02:22 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 06:06:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 06:06:59 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 06:10:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 06:10:37 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 06:12:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 06:12:26 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 06:14:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 06:18:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 06:18:14 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 06:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 06:22:27 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 06:23:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 06:23:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 06:26:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 06:29:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 06:29:18 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 06:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 06:32:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 06:35:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 06:35:07 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 06:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 06:39:39 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 06:42:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 06:42:54 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 06:50:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 06:50:40 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 06:51:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 06:53:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 06:53:01 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 06:53:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 06:53:01 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 07:00:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 07:00:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 07:00:41 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 07:03:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 07:03:20 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 07:05:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 07:05:08 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 07:10:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 07:10:45 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 07:13:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 07:13:23 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 07:16:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 07:16:39 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 07:21:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 07:21:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 07:23:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 07:23:56 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 07:27:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 07:27:49 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 07:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 07:31:13 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 07:34:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 07:34:02 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 07:38:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 07:38:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 07:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 07:41:22 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 07:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3aff8b38-4e46-4cf3-0260-172df2efe7c9 (at 10.9.103.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe68d400, cur 1565707347 expire 1565707197 last 1565707120 Aug 13 07:42:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 07:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 07:44:29 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 07:49:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 07:49:55 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 07:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 07:52:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 07:54:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 07:54:30 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 08:02:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 08:02:44 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 08:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 08:03:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 08:04:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 08:04:35 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 08:13:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 08:13:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 08:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 08:14:41 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 08:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 08:14:41 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 08:24:42 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 08:35:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 08:35:41 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 08:35:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 08:35:41 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 08:36:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 08:36:07 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 08:45:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 08:45:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 08:45:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 08:45:41 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 08:50:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 08:50:10 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 08:56:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 08:56:04 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 08:57:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 08:57:26 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 09:01:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 09:01:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 09:06:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 09:06:23 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 09:08:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 09:08:13 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 09:13:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 09:13:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 09:16:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 09:16:29 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 09:19:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 09:19:15 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 09:19:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 09:23:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 09:23:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 09:26:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 09:26:53 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 09:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 09:29:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 09:34:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 09:34:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 09:37:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 09:37:18 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 09:39:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 09:39:51 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 09:45:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 09:45:53 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 09:47:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 09:47:34 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 09:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 09:50:19 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 09:57:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 09:57:40 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 09:59:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 09:59:29 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 10:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 10:02:12 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 10:08:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 10:08:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 10:12:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 10:12:14 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 10:13:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 10:13:06 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 10:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 10:19:00 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 13 10:21:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 10:21:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 10:22:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 10:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 10:22:15 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 10:24:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 10:24:34 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 10:29:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 10:29:29 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 10:33:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 10:33:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 10:34:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 10:34:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 10:39:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 10:39:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 10:43:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 10:43:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 10:45:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 10:45:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 10:49:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 10:49:54 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 10:54:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 10:54:26 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 11:00:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 11:00:46 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 11:00:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 11:00:46 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 11:04:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 11:04:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 11:11:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 11:11:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 11:11:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 11:11:21 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 11:15:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 11:15:31 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 11:21:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 11:21:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 11:21:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 11:21:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 11:26:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 11:26:10 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 11:31:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 11:31:32 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 11:35:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 11:35:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 11:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8cbc76bf-3d76-f93c-6762-65949a459c4a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f38b5800, cur 1565721321 expire 1565721171 last 1565721094 Aug 13 11:35:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 11:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 11:36:31 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 11:37:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 11:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 11:42:27 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 11:45:11 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565721900/real 1565721900] req@ffff8f27f47bf200 x1636763326191232/t0(0) o104->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565721911 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 13 11:45:11 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 13 11:45:15 fir-md1-s1 kernel: Lustre: 21460:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ba9234800 x1640603965422944/t0(0) o36->f2ac4397-2e51-a615-ba22-d10920eaecbc@10.9.116.7@o2ib4:20/0 lens 528/448 e 1 to 0 dl 1565721920 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 11:45:18 fir-md1-s1 kernel: Lustre: 50444:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d8e50d100 x1640678784308816/t0(0) o101->a4eb6a9d-f044-2635-b3fe-16ba018530e3@10.8.25.23@o2ib6:23/0 lens 584/3264 e 1 to 0 dl 1565721923 ref 2 fl Interpret:/0/0 rc 0/0 Aug 13 11:45:18 fir-md1-s1 kernel: Lustre: 50444:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 13 11:45:22 fir-md1-s1 kernel: Lustre: 24580:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565721911/real 1565721911] req@ffff8f27f47bf200 x1636763326191232/t0(0) o104->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565721922 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 13 11:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 715cf824-85b5-4eda-cd80-e9c9e34696d0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2124fb4000, cur 1565721937 expire 1565721787 last 1565721710 Aug 13 11:45:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 11:47:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 11:47:25 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 11:48:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 11:48:18 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 11:52:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 11:52:52 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 13 11:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 11:58:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 11:58:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 11:58:49 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 12:02:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 12:02:57 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 12:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 12:08:53 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 12:10:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 12:10:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 12:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 12:12:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 12:14:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 12:18:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a39badbd-626d-a84b-e112-dc0449a95822 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2609246c00, cur 1565723913 expire 1565723763 last 1565723686 Aug 13 12:18:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 12:18:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a39badbd-626d-a84b-e112-dc0449a95822 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d96ca5c00, cur 1565723921 expire 1565723771 last 1565723694 Aug 13 12:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 12:19:42 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 12:22:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 12:22:29 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 12:22:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 12:22:59 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 12:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5410ef73-212c-0ef2-08c0-7480a43850e6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f167f47e800, cur 1565724347 expire 1565724197 last 1565724120 Aug 13 12:25:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 12:31:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 12:31:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 12:31:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 11de7dc7-0692-a3dd-b019-6403f601a7bc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c005e6000, cur 1565724701 expire 1565724551 last 1565724474 Aug 13 12:31:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 12:32:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 12:32:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 12:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 12:34:00 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 12:41:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 12:41:19 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 12:44:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 12:44:14 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 12:48:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 12:48:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 12:49:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 12:52:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 12:52:18 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 12:54:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 12:54:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 12:59:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 12:59:09 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 13:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 13:02:22 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 13:04:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:04:39 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 13:07:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 13:09:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 13:09:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 13:12:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 13:12:25 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 13:14:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:14:42 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 13:19:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 13:22:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 13:22:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 13:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4a752229-9395-d33b-fa36-6a220f792547 (at 10.9.109.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39c9c00800, cur 1565727798 expire 1565727648 last 1565727571 Aug 13 13:23:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 13:24:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:24:44 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 13:25:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 13:26:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 13:32:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 13:32:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 13:34:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 13:34:08 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 13:35:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:35:16 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 13:44:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 13:44:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 13:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 13:44:53 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 13:45:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:45:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 13:55:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 13:55:16 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 13:56:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 13:56:28 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 13:56:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 13:56:28 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 13:58:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 14:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 14:05:41 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 14:06:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 14:06:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 14:06:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 14:06:36 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 14:16:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 14:16:06 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 14:16:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 14:16:59 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 14:19:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 14:19:37 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 14:27:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 14:27:18 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 14:27:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 14:27:18 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 14:30:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 14:30:53 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 14:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 14:37:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 14:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 14:37:30 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 14:42:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 14:42:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 14:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0ab6aebd-c723-d3d9-d148-ea337f4d2f2a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f4166c00, cur 1565732663 expire 1565732513 last 1565732436 Aug 13 14:44:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 14:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 14:48:24 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 14:48:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 14:48:24 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 14:53:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 14:53:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 14:59:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 14:59:16 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 14:59:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 14:59:42 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 15:03:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 15:03:45 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 15:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 15:10:13 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 15:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 15:10:13 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 13 15:16:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 15:16:22 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 15:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 15:20:24 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 15:20:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 15:20:24 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 15:22:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 15:27:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 15:27:12 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 15:30:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 15:30:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 15:30:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 15:30:47 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 15:34:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 15:38:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 15:38:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 15:40:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 15:40:51 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 15:41:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 15:41:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 15:41:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 15:50:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 15:50:01 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 15:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 15:51:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 13 15:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 15:51:50 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 15:55:39 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 13 15:56:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 13 15:59:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 16:00:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 16:00:06 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 16:02:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 16:02:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 16:02:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 16:02:45 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 13 16:04:43 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 13 16:04:43 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 13 16:08:36 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 13 16:10:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 16:10:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 16:10:55 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 13 16:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 16:12:58 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 16:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 16:12:58 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 16:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 16:23:38 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 16:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 16:23:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 16:25:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 16:25:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 16:34:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 16:34:01 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 16:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 16:35:19 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 16:36:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 16:36:26 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 16:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 16:44:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 16:45:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 16:45:29 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 16:48:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 16:48:40 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 16:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 16:54:37 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 16:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 16:55:32 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 16:58:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 16:58:41 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 17:05:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 17:05:06 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 17:05:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 17:05:34 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 17:09:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 17:09:10 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 17:13:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 17:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 17:15:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 17:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 17:16:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 17:21:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 17:21:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 17:25:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 17:25:33 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 17:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 17:26:36 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 17:34:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 17:34:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 17:36:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 17:36:07 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 17:36:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 17:36:38 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 17:46:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 17:46:15 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 17:46:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 17:46:42 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 17:47:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 17:47:10 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 17:56:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 17:56:15 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 17:57:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 17:57:10 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 18:03:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 18:03:29 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 18:07:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 18:07:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 18:07:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 18:07:34 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 18:15:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 18:15:40 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 18:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 18:19:42 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 18:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 18:19:42 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 13 18:23:11 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 13 18:27:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 18:27:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 18:27:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 18:30:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 18:30:10 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 18:30:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 18:30:10 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 18:38:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 18:38:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 18:40:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 18:40:13 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 18:40:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 18:40:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 18:41:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 18:50:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 18:50:40 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 18:50:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 18:50:40 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 18:52:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 18:52:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 19:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 19:00:45 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 19:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 19:00:45 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 19:01:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 19:03:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 19:03:29 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 19:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 19:10:51 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 19:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 19:10:51 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 19:15:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 19:15:29 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 19:21:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 19:21:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 19:23:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 19:23:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 19:25:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 19:26:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 19:26:14 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 19:31:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 19:31:54 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 19:33:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 19:33:17 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 19:40:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 521d20de-a9b7-d445-b6af-cebfe3f1bae7 (at 10.9.108.65@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39c0b99400, cur 1565750402 expire 1565750252 last 1565750175 Aug 13 19:40:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 19:40:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 19:40:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 19:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 19:42:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 19:43:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 19:43:21 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 19:51:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 19:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 19:52:15 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 19:53:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 19:53:26 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 19:55:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 19:55:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 20:03:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 20:03:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 20:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 20:03:33 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 13 20:05:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 20:05:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 20:13:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 20:13:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 20:13:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 20:13:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 20:16:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 20:16:15 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 20:17:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 20:21:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 20:22:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 20:23:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 20:23:28 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 13 20:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 20:24:27 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 20:26:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 20:26:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 20:34:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 20:34:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 20:34:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 20:34:35 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 20:36:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 20:36:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 20:45:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 20:45:07 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 20:45:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 20:45:10 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 20:47:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 20:47:48 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 20:48:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 20:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 20:55:40 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 20:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 20:55:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 20:58:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 20:58:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 21:06:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 21:06:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 21:06:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 21:06:30 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 21:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 21:10:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 21:16:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 21:16:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 21:16:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 21:16:35 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 21:21:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 21:21:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 21:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 21:27:00 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 21:27:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 21:27:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 21:32:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 21:32:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 21:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 21:37:11 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 21:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 21:37:11 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 21:43:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 21:43:25 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 21:47:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 21:47:54 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 21:47:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 21:47:54 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 21:53:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 21:53:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 13 21:58:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 21:58:00 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 13 21:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 21:58:28 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 22:04:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 22:04:00 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 13 22:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 22:09:55 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 13 22:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 22:09:55 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 22:14:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 22:14:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 22:19:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 22:19:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 13 22:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 22:20:54 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 22:25:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 22:25:31 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 22:29:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 22:29:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 22:30:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 22:30:43 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 22:32:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 22:32:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 22:32:05 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 22:37:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 22:37:30 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 13 22:41:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 22:41:07 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 13 22:42:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 81bfeebd-398e-a3c9-b1b1-62fc749f8701 (at 10.9.103.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1244535400, cur 1565761339 expire 1565761189 last 1565761112 Aug 13 22:42:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 22:43:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 22:43:27 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 22:48:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 22:48:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 13 22:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 22:51:10 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 22:53:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 22:53:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 13 23:00:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 23:00:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 23:01:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 23:01:54 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 23:04:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 23:04:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 13 23:07:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 23:11:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 23:11:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 23:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 23:12:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 23:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 23:14:58 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 23:22:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 23:22:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 13 23:23:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 23:23:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 23:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 13 23:26:15 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 13 23:30:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 13 23:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 23:33:36 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 13 23:34:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 13 23:34:03 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 13 23:36:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 23:36:46 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 13 23:44:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 13 23:44:21 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 13 23:44:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 23:44:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 13 23:47:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 23:47:33 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 13 23:54:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 13 23:54:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 13 23:54:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 13 23:54:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 13 23:59:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 13 23:59:15 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 00:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 00:04:31 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 00:06:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 00:06:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 00:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 00:10:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 00:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 00:15:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 00:16:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 00:16:52 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 00:18:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 00:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 00:20:54 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 00:23:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 00:25:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 00:25:26 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 00:29:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 00:29:36 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 00:30:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 00:30:59 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 00:33:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 00:35:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 00:35:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 00:41:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 00:41:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 00:43:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 00:43:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 00:43:46 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 00:45:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 00:45:35 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 00:51:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 00:51:36 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 00:54:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 00:54:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 00:55:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 00:55:39 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 01:02:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 01:02:04 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 01:06:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 01:06:23 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 01:09:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 01:09:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 01:12:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 01:12:12 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 01:16:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 01:16:44 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 14 01:22:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 01:22:09 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 01:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 01:22:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 01:27:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 01:27:08 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 01:33:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 01:33:03 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 01:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 01:33:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 01:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 01:37:11 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 01:43:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 01:43:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 01:43:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 01:43:42 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 01:47:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 01:47:19 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 01:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 01:54:06 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 01:55:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 01:55:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 01:58:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 01:58:15 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 02:04:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 02:04:35 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 02:05:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 02:05:02 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 02:09:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 02:09:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 02:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 02:15:01 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 02:16:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 02:16:18 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 02:20:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 02:20:33 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 02:26:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 02:26:55 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 02:26:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 02:26:57 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 02:30:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 02:30:40 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 02:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 02:37:03 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 02:37:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 02:37:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 02:41:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 02:41:09 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 02:48:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 02:48:58 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 02:51:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 02:51:19 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 02:52:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 02:52:12 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 02:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 02:59:20 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 03:01:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 03:01:39 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 03:04:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 03:04:04 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 03:10:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 03:10:40 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 03:11:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 03:11:51 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 03:15:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 03:15:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 03:21:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 03:21:51 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 03:22:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 03:22:23 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 03:28:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 03:28:37 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 03:32:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 03:32:14 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 03:32:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 03:32:25 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 03:39:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 03:39:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 03:42:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 03:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 03:42:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 03:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 03:42:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 03:52:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 03:52:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 03:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 03:52:58 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 03:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 03:52:58 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 03:57:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 04:02:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 04:02:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 04:03:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 04:03:18 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 04:03:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 04:03:18 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 04:10:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 04:12:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 04:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 04:13:45 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 04:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 04:13:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 04:16:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 04:16:42 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 04:24:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 04:24:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 04:24:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 04:24:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 04:29:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 04:29:31 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 04:34:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 04:34:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 04:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 04:35:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 04:39:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 04:39:37 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 04:41:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 04:44:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 04:44:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 04:46:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 04:46:24 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 04:49:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 04:49:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 04:54:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 04:54:59 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 04:57:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 04:57:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 05:01:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 05:01:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 05:05:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 05:05:24 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 05:08:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 05:08:35 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 05:12:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 05:12:12 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 05:16:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 05:16:01 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 05:18:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 05:18:44 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 05:24:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 05:24:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 05:26:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 05:26:11 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 05:28:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 05:28:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 05:34:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 05:34:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 05:35:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 05:36:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 05:36:33 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 05:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 05:38:51 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 05:45:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 05:45:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 05:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 05:46:33 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 14 05:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 05:49:42 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 05:56:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 05:56:45 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 05:56:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 05:56:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 05:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 05:59:58 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 06:06:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 06:06:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 06:07:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 06:07:40 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 06:10:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 06:10:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 06:17:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 06:17:17 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 06:19:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 06:19:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 06:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 06:21:20 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 06:22:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 06:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 06:27:40 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 06:31:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 06:31:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 06:31:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 06:31:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 06:38:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 06:38:28 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 06:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 06:41:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 06:42:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 06:42:54 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 06:48:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 06:48:47 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 06:51:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 06:51:55 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 06:52:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 06:52:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 06:56:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 06:59:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 06:59:37 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 07:02:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 07:02:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 07:04:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:04:21 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 07:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 07:09:52 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 07:13:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 07:13:29 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 07:14:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:14:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 07:21:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 07:21:18 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 07:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 07:24:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 07:24:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:24:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 07:31:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 07:31:27 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 07:34:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 07:34:12 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 07:35:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:35:05 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 07:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 07:41:27 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 07:41:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 07:42:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 07:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 07:45:09 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 07:46:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:46:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 07:51:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 07:51:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 07:56:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 07:56:10 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 07:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 07:56:37 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 07:57:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 07:58:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 08:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 08:02:37 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 08:06:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 08:06:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 08:07:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 08:07:38 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 08:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 08:14:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 08:17:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 08:17:34 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 08:19:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 08:19:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 08:24:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 08:24:59 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 08:27:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 08:27:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 08:27:43 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 08:29:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2de7c91b-6680-d047-e089-06e894cb4ae5 (at 10.8.23.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff4a97000, cur 1565796560 expire 1565796410 last 1565796333 Aug 14 08:29:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 08:30:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 08:30:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 08:30:35 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 08:35:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 08:35:24 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 14 08:38:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 08:38:08 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 08:40:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 08:41:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 08:41:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 08:41:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 08:47:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 08:47:37 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 08:49:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 08:49:25 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 08:53:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 08:53:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 08:58:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 08:58:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 08:59:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 08:59:30 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 09:04:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 09:04:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 09:08:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 09:08:31 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 09:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 09:09:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 09:16:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 09:16:12 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 09:19:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 09:19:19 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 09:20:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 09:20:13 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 09:22:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 09:22:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 09:26:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 09:30:42 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 09:41:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 09:41:02 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 09:41:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 09:41:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 09:41:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 09:41:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 09:42:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 09:51:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 09:51:44 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 09:52:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 09:52:19 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 09:52:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 09:52:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 09:56:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 10:01:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 10:01:49 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 10:02:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 10:02:44 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 10:03:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 10:03:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 10:09:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 10:11:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 10:11:50 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 10:14:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 10:14:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 10:21:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 10:21:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 10:21:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 10:21:54 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 10:23:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 10:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 10:25:04 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 10:32:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 10:32:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 10:32:01 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 10:32:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 10:33:55 fir-md1-s1 kernel: Lustre: 23655:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565804028/real 1565804028] req@ffff8f3dbdd89500 x1636764371104208/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565804035 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 14 10:33:55 fir-md1-s1 kernel: Lustre: 23655:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 14 10:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 10:35:30 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 10:37:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5008638b-f18b-9318-c34f-d074d1352432 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17e8204800, cur 1565804277 expire 1565804127 last 1565804050 Aug 14 10:37:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 10:38:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5008638b-f18b-9318-c34f-d074d1352432 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f1b827c00, cur 1565804289 expire 1565804139 last 1565804062 Aug 14 10:38:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 10:42:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 10:42:25 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 14 10:43:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d36a60c6-ac8c-d8b8-478f-2092dc2a81c2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e00183800, cur 1565804630 expire 1565804480 last 1565804403 Aug 14 10:45:31 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 10:45:31 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 10:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 10:45:32 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 10:52:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 10:52:52 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 10:53:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 10:55:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 10:55:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 10:57:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 10:57:12 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 11:03:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 11:03:02 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 11:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 11:06:40 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 11:07:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 11:07:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 11:10:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 11:13:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 11:13:50 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 11:18:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 11:18:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 11:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 11:18:31 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 11:24:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 11:24:22 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 11:25:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 11:28:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 11:28:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 11:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 11:29:07 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 11:34:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 11:34:31 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 11:39:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 11:39:05 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 11:39:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 11:39:25 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 11:44:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 11:44:35 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 11:48:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 11:49:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 11:49:57 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 11:52:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 11:52:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 11:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0df7430c-befd-bc46-ea78-35a335dabcbd (at 10.9.112.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15afcec400, cur 1565808927 expire 1565808777 last 1565808700 Aug 14 11:55:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 11:56:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 11:56:27 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 11:57:46 fir-md1-s1 kernel: Lustre: 23604:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2a6a31b900 x1634627359629168/t0(0) o36->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:21/0 lens 552/2888 e 1 to 0 dl 1565809071 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 11:57:52 fir-md1-s1 kernel: Lustre: 23703:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b440bec00 x1631556047504560/t0(0) o36->603ef852-66df-b745-900b-b12995ddbb59@10.9.104.51@o2ib4:27/0 lens 552/2888 e 1 to 0 dl 1565809077 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 11:58:16 fir-md1-s1 kernel: Lustre: 21461:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1d53b44800 x1639522457575904/t0(0) o36->04c17dce-45f1-fe7e-2627-7efeaaeaddb9@10.9.0.62@o2ib4:21/0 lens 544/2888 e 0 to 0 dl 1565809101 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 11:58:33 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f22b2c91e00 x1638288265216208/t0(0) o36->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:8/0 lens 536/2888 e 0 to 0 dl 1565809118 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 11:58:41 fir-md1-s1 kernel: Lustre: 24586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d54246000 x1634134589748736/t0(0) o36->c1420e99-ffe3-a133-75d0-8971e96a81cc@10.9.106.36@o2ib4:16/0 lens 552/2888 e 1 to 0 dl 1565809126 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 11:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 14 11:59:01 fir-md1-s1 kernel: LustreError: 23725:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809051, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2d4e9e8900/0x5d9ee6cb3279f8ad lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6cb3279f8c9 expref: -99 pid: 23725 timeout: 0 lvb_type: 0 Aug 14 11:59:01 fir-md1-s1 kernel: LustreError: 23725:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 14 11:59:38 fir-md1-s1 kernel: LustreError: 20723:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809088, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2436795340/0x5d9ee6cb32c2919f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 25 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20723 timeout: 0 lvb_type: 0 Aug 14 11:59:39 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Wed Aug 14 11:59:39 2019 Aug 14 11:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 46725c7e-13ed-427c-fac8-b2b98cb851a6 (at 10.8.17.12@o2ib6) reconnecting Aug 14 11:59:58 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Aug 14 12:02:38 fir-md1-s1 kernel: Lustre: 23714:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2f99f18300 x1634627359871936/t0(0) o36->46725c7e-13ed-427c-fac8-b2b98cb851a6@10.8.17.12@o2ib6:13/0 lens 552/2888 e 0 to 0 dl 1565809363 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:03:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 14 12:03:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 12:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 14 12:03:43 fir-md1-s1 kernel: LustreError: 20996:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809333, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2b4d9c3600/0x5d9ee6cb34599770 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 10 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6cb34599777 expref: -99 pid: 20996 timeout: 0 lvb_type: 0 Aug 14 12:03:43 fir-md1-s1 kernel: LustreError: 20996:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 14 12:03:52 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1ec0af1800 x1638809299869648/t0(0) o36->ca693efe-e963-3124-a59d-0beac55f4de3@10.9.112.17@o2ib4:27/0 lens 592/2888 e 1 to 0 dl 1565809437 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:03:52 fir-md1-s1 kernel: Lustre: 20721:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Aug 14 12:04:28 fir-md1-s1 kernel: Lustre: 23654:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3eeb24c800 x1639333764336992/t0(0) o36->39e76845-4976-21c9-38bb-bb738759d72c@10.9.0.64@o2ib4:3/0 lens 528/2888 e 0 to 0 dl 1565809473 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:04:28 fir-md1-s1 kernel: Lustre: 23654:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 14 12:08:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 12:08:01 fir-md1-s1 kernel: Lustre: Skipped 92 previous similar messages Aug 14 12:09:16 fir-md1-s1 kernel: Lustre: 23583:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2c74a04800 x1633776974661664/t0(0) o36->5ef45f19-459d-828d-fcff-ba0df2051c6a@10.8.15.8@o2ib6:21/0 lens 528/2888 e 0 to 0 dl 1565809761 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 04c17dce-45f1-fe7e-2627-7efeaaeaddb9 (at 10.9.0.62@o2ib4) reconnecting Aug 14 12:10:07 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Aug 14 12:10:21 fir-md1-s1 kernel: LustreError: 23619:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809731, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f15bfd32400/0x5d9ee6cb36f5daa7 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 16 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23619 timeout: 0 lvb_type: 0 Aug 14 12:10:37 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 14 12:10:37 fir-md1-s1 kernel: LustreError: 20724:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809747, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f17c7140d80/0x5d9ee6cb37084398 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6cb3708439f expref: -99 pid: 20724 timeout: 0 lvb_type: 0 Aug 14 12:10:37 fir-md1-s1 kernel: LustreError: 20724:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Aug 14 12:12:11 fir-md1-s1 kernel: Lustre: 10504:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0e11273300 x1631556048688144/t0(0) o36->603ef852-66df-b745-900b-b12995ddbb59@10.9.104.51@o2ib4:16/0 lens 552/2888 e 0 to 0 dl 1565809936 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:12:11 fir-md1-s1 kernel: Lustre: 10504:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 14 12:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 14 12:13:19 fir-md1-s1 kernel: LustreError: 23602:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565809909, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f11124d1d40/0x5d9ee6cb3834355b lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 14 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23602 timeout: 0 lvb_type: 0 Aug 14 12:13:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 12:13:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 12:19:27 fir-md1-s1 kernel: Lustre: 23619:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2fb7ed8f00 x1634134591776800/t0(0) o36->c1420e99-ffe3-a133-75d0-8971e96a81cc@10.9.106.36@o2ib4:1/0 lens 552/2888 e 0 to 0 dl 1565810371 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 12:19:27 fir-md1-s1 kernel: Lustre: 23619:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 14 12:19:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d314c916-d68d-db9f-ee0f-59ee4d488258 (at 10.9.106.36@o2ib4) Aug 14 12:19:33 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 14 12:20:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 07fba339-e56e-2a33-8265-bd357e7b0598 (at 10.8.29.2@o2ib6) reconnecting Aug 14 12:20:17 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Aug 14 12:20:31 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 14 12:20:31 fir-md1-s1 kernel: LustreError: 50580:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565810341, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f086e2d98c0/0x5d9ee6cb3afc64e5 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6cb3afc6501 expref: -99 pid: 50580 timeout: 0 lvb_type: 0 Aug 14 12:20:31 fir-md1-s1 kernel: LustreError: 50580:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 14 12:20:45 fir-md1-s1 kernel: LustreError: 23601:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565810355, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3501719200/0x5d9ee6cb3b148729 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 32 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23601 timeout: 0 lvb_type: 0 Aug 14 12:25:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 12:25:27 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 12:29:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 12:29:33 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 14 12:31:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 12:31:50 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 14 12:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bc83c7c5-08aa-b1e5-1dd5-b1a51ba5cb4a (at 10.8.1.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45038de000, cur 1565811340 expire 1565811190 last 1565811113 Aug 14 12:35:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 12:36:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 12:36:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 12:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 12:39:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 12:41:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 12:41:51 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 12:48:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 12:48:24 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 12:49:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 12:49:54 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 12:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 12:52:34 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 12:52:56 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:52:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 14 12:52:59 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 14 12:53:08 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:53:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:53:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Aug 14 12:53:30 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:53:30 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 14 12:53:43 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:53:43 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 14 12:54:05 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 12:58:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 13:00:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 13:00:16 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 13:00:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 13:00:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 14 13:03:18 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 13:03:18 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 14 13:03:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9a46a636-d807-725a-1806-a4c05a6a1620 (at 10.8.18.24@o2ib6) reconnecting Aug 14 13:03:37 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 14 13:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 13:10:39 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 14 13:10:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 13:10:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 13:11:15 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 13:11:15 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 15 previous similar messages Aug 14 13:14:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 13:14:40 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Aug 14 13:21:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 13:21:00 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 14 13:21:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 13:21:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 19 previous similar messages Aug 14 13:21:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 13:21:48 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 13:25:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 13:25:03 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 13:27:54 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 13:27:54 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 7 previous similar messages Aug 14 13:29:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 13:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 13:31:23 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 14 13:33:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 13:33:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 13:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 13:35:28 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 13:41:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 13:41:58 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 14 13:44:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 13:44:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 13:46:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 13:46:31 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 13:52:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 13:52:00 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 13:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 13:56:37 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 13:56:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 13:56:43 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 14:02:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 14:02:07 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 14:07:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 14:07:05 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 14:07:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 14:07:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 14:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 14:12:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 14:17:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 14:17:26 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 14:17:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 14:17:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 14:22:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 14:22:59 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 14 14:28:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 14:28:11 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 14:29:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 14:29:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 14:33:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 14:33:04 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 14:38:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 14:38:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 14:41:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 14:41:38 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 14:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 14:43:29 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 14:47:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 14:49:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 14:49:18 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 14:51:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 14:54:30 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 14:54:30 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 14 14:54:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 14:54:34 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 14:55:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 14:55:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 14:55:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 14:59:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 14:59:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 15:04:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 15:04:40 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 15:10:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 15:10:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 15:10:21 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 15:11:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 15:11:33 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 15:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 15:14:41 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 15:15:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 15:20:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 15:20:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 15:21:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 15:21:35 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 15:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 15:24:41 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 15:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 15:31:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 15:33:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 15:33:18 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 15:35:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 15:35:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 15:43:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 15:43:41 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 15:46:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 15:46:15 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 15:46:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 15:46:15 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 15:51:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bb346867-aa44-266b-710a-43e0895b9e3d (at 10.8.1.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30710d0800, cur 1565823101 expire 1565822951 last 1565822874 Aug 14 15:51:41 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 15:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 15:54:02 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 15:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 15:56:21 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 15:59:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 15:59:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 16:04:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07e259a6-fe0a-4e5b-db69-4271194dceb4 (at 10.9.103.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30cb0a3c00, cur 1565823850 expire 1565823700 last 1565823623 Aug 14 16:04:10 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Aug 14 16:04:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 16:04:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 16:04:15 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 16:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 16:06:58 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 16:10:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 16:10:04 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 16:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 16:15:33 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 16:17:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 16:17:26 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 16:21:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 16:21:50 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 16:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 16:26:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 16:27:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 16:27:44 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 16:32:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 16:32:10 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 16:37:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 16:37:38 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 16:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 16:38:05 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 16:42:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 16:42:10 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 16:45:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 16:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 16:47:40 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 16:48:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 16:48:08 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 16:53:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 16:53:06 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 16:58:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 16:58:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 16:58:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f0350e01-274c-a247-0abb-cb6ba82c2ce6 (at 10.8.23.15@o2ib6) Aug 14 16:58:42 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Aug 14 17:01:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 17:03:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 17:03:27 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 17:06:15 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 14 17:06:15 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 14 17:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 17:09:44 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 17:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 17:09:44 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 14 17:13:20 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 14 17:14:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 17:14:26 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 17:15:14 fir-md1-s1 kernel: Lustre: 23580:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 14 17:15:14 fir-md1-s1 kernel: Lustre: 23580:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 14 17:20:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 17:20:17 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 17:20:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 17:20:18 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 17:23:57 fir-md1-s1 kernel: Lustre: 23623:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 14 17:27:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 17:27:08 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 17:28:38 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 17:28:48 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 17:28:48 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 14 17:29:02 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 17:29:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 17:29:15 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Aug 14 17:30:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0bcd0825-5f35-e709-e57c-d41ae345f214 (at 10.8.23.9@o2ib6) Aug 14 17:30:30 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Aug 14 17:31:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 17:31:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 17:31:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 14 17:37:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 17:37:19 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 17:37:43 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 14 17:37:43 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 14 17:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 17:41:22 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 17:43:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 17:43:37 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 17:49:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 17:49:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 17:51:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 17:51:22 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 17:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 17:54:02 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 18:00:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 18:00:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 18:01:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 18:01:41 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 18:01:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 18:04:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 18:04:21 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 18:10:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 18:10:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 18:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 18:12:03 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 18:14:10 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f051945a450 x1635211377838432/t0(0) o3->577ac993-4ad9-0dce-4697-0326d1fd44f4@10.9.107.30@o2ib4:15/0 lens 488/440 e 1 to 0 dl 1565831655 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 18:14:10 fir-md1-s1 kernel: Lustre: 49252:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 14 18:15:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 18:15:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 18:21:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 18:21:02 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 18:22:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 18:22:53 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 18:26:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 18:26:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 18:30:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 18:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 18:33:14 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 18:36:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 18:36:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 18:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 18:36:36 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 18:39:35 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 18:39:35 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 14 18:43:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 18:43:36 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 18:47:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 18:47:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 18:50:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 18:50:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 18:53:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 18:53:57 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 18:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 18:57:36 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 18:59:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 19:00:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 19:00:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 19:04:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 19:04:00 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 19:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 19:08:33 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 19:09:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 19:10:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 19:10:23 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 19:11:06 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 19:14:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 19:14:35 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 19:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 19:18:43 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 19:20:32 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 19:20:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 19:22:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3b5be929-fc95-5c11-6f3d-c23d567efdc8 (at 10.8.8.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b0be53000, cur 1565835731 expire 1565835581 last 1565835504 Aug 14 19:22:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 19:25:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 19:25:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 19:28:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 19:28:49 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 19:31:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5735cd86-3a30-362c-bc05-c634d3fa1859 (at 10.9.107.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4518da6c00, cur 1565836302 expire 1565836152 last 1565836075 Aug 14 19:31:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 19:32:38 fir-md1-s1 kernel: Lustre: 21484:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b1a5ac50 x1631556996525632/t0(0) o4->6efc0e4b-1ad3-bb80-daf0-68493389a065@10.9.106.18@o2ib4:13/0 lens 488/448 e 1 to 0 dl 1565836363 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 19:32:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 19:32:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 19:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 19:35:14 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 14 19:38:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 19:38:57 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 19:43:14 fir-md1-s1 kernel: Lustre: 10502:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 19:43:14 fir-md1-s1 kernel: Lustre: 10502:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 14 19:43:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 19:43:41 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 19:45:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 19:45:26 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 19:49:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 19:49:33 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 14 19:56:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 19:56:53 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 19:58:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 19:58:16 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 19:59:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 19:59:36 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 20:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 20:06:58 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 20:10:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 20:10:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 20:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 20:11:32 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 20:17:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 20:17:02 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 20:20:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7f8dc145-a081-da87-1da4-154358301486 (at 10.9.108.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1506fa9000, cur 1565839257 expire 1565839107 last 1565839030 Aug 14 20:20:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 20:21:31 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 20:21:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 20:21:56 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 20:24:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 20:24:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 14 20:27:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 20:27:25 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 14 20:30:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 47f03a4e-a86a-0507-d5b8-a28a42ed3e02 (at 10.9.108.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f452439f000, cur 1565839858 expire 1565839708 last 1565839631 Aug 14 20:30:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 20:31:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 47f03a4e-a86a-0507-d5b8-a28a42ed3e02 (at 10.9.108.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148afb7800, cur 1565839865 expire 1565839715 last 1565839638 Aug 14 20:31:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 20:32:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 20:32:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 20:36:13 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 20:36:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 20:37:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 20:37:33 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 20:41:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 20:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 20:43:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 20:47:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 20:47:59 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 20:49:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 20:49:22 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 20:51:12 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841065/real 1565841065] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841072 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 14 20:51:19 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841072/real 1565841072] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841079 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:51:20 fir-md1-s1 kernel: Lustre: 23618:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26ec6dc500 x1631611207084320/t0(0) o101->8c191431-c80e-a99c-d724-6274df7fd787@10.9.102.10@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1565841085 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 20:51:26 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841079/real 1565841079] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841086 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:51:33 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841086/real 1565841086] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841093 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:51:40 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841093/real 1565841093] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841100 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:51:54 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841107/real 1565841107] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841114 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:51:54 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 14 20:52:15 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841128/real 1565841128] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841135 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:52:15 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 14 20:52:51 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841163/real 1565841163] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841170 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:52:51 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 14 20:53:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8c191431-c80e-a99c-d724-6274df7fd787 (at 10.9.102.10@o2ib4) reconnecting Aug 14 20:53:33 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 20:54:01 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565841234/real 1565841234] req@ffff8f0bba58b300 x1636765329711232/t0(0) o106->fir-MDT0002@10.9.108.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1565841241 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 20:54:01 fir-md1-s1 kernel: Lustre: 20459:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 14 20:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f649fa1c-e7fe-d613-2a65-337c97d2e136 (at 10.9.108.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252cbdf400, cur 1565841244 expire 1565841094 last 1565841017 Aug 14 20:54:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f649fa1c-e7fe-d613-2a65-337c97d2e136 (at 10.9.108.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d2221000, cur 1565841249 expire 1565841099 last 1565841022 Aug 14 20:54:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 20:58:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 20:58:06 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 21:01:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 21:01:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 21:01:17 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 21:03:55 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 21:03:55 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 14 21:04:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 21:04:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 21:07:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 28e8546e-20d5-271a-3158-9e1049ea2c7b (at 10.8.7.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25019c7400, cur 1565842060 expire 1565841910 last 1565841833 Aug 14 21:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 28e8546e-20d5-271a-3158-9e1049ea2c7b (at 10.8.7.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efdfa400, cur 1565842075 expire 1565841925 last 1565841848 Aug 14 21:08:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 21:08:30 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 21:13:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 21:13:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 14 21:15:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 21:15:16 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 14 21:15:41 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565842534/real 1565842534] req@ffff8f44afae3000 x1636765374128832/t0(0) o104->fir-MDT0002@10.8.20.27@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565842541 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 14 21:15:49 fir-md1-s1 kernel: Lustre: 10362:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f43372c1e00 x1638245260840496/t0(0) o101->27ef9320-178f-53ac-b738-4bc2f228a23d@10.9.0.63@o2ib4:24/0 lens 1792/3288 e 1 to 0 dl 1565842554 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 21:16:02 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565842555/real 1565842555] req@ffff8f44afae3000 x1636765374128832/t0(0) o104->fir-MDT0002@10.8.20.27@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565842562 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 21:16:02 fir-md1-s1 kernel: Lustre: 27316:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 14 21:16:09 fir-md1-s1 kernel: LustreError: 27316:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.27@o2ib6) failed to reply to blocking AST (req@ffff8f44afae3000 x1636765374128832 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f28aecbca40/0x5d9ee6cd03467d71 lrc: 4/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.20.27@o2ib6 remote: 0xc4e27561c88dc053 expref: 8 pid: 21380 timeout: 4957651 lvb_type: 0 Aug 14 21:16:09 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.27@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 14 21:16:09 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.20.27@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f28aecbca40/0x5d9ee6cd03467d71 lrc: 3/0,0 mode: PR/PR res: [0x2c002c5c4:0x1960b:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.20.27@o2ib6 remote: 0xc4e27561c88dc053 expref: 9 pid: 21380 timeout: 0 lvb_type: 0 Aug 14 21:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a18ff26-2f90-35f3-8dc0-c084882f2a83 (at 10.8.18.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24f474fc00, cur 1565842620 expire 1565842470 last 1565842393 Aug 14 21:17:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 21:18:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 21:18:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 14 21:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 534e10c9-e8b6-b009-609a-c6de708bb45f (at 10.8.27.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34c0d4dc00, cur 1565843022 expire 1565842872 last 1565842795 Aug 14 21:23:42 fir-md1-s1 kernel: Lustre: Skipped 145 previous similar messages Aug 14 21:24:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 21:24:47 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 21:27:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 21:27:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 21:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 21:28:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 21:30:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 21:34:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 21:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 21:37:09 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 14 21:38:43 fir-md1-s1 kernel: Lustre: 23688:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 21:38:43 fir-md1-s1 kernel: Lustre: 23688:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 14 21:38:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 21:38:59 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 21:45:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 21:45:54 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 21:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 21:47:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 14 21:49:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6526b6a6-ab94-956d-8187-cfeb5d055fea (at 10.8.13.29@o2ib6) Aug 14 21:49:03 fir-md1-s1 kernel: Lustre: Skipped 103 previous similar messages Aug 14 21:56:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 21:56:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 21:57:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 21:57:19 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 14 21:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 21:59:07 fir-md1-s1 kernel: Lustre: Skipped 130 previous similar messages Aug 14 22:07:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 22:07:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 22:07:20 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 22:08:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 22:08:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 22:09:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 22:09:10 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 14 22:14:42 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 14 22:14:42 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Aug 14 22:17:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 22:17:22 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 22:19:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 22:19:12 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 14 22:19:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 22:19:17 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 14 22:21:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 22:28:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 22:28:01 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 22:29:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 22:29:22 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 14 22:29:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 22:29:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 22:32:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3f9b2680-6bb8-52e4-9927-72846e6311ee (at 10.9.101.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33ad3e6c00, cur 1565847160 expire 1565847010 last 1565846933 Aug 14 22:32:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 22:32:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3f9b2680-6bb8-52e4-9927-72846e6311ee (at 10.9.101.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fcfc6000, cur 1565847173 expire 1565847023 last 1565846946 Aug 14 22:32:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 22:39:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 22:39:46 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 14 22:39:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 22:39:46 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 14 22:44:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 22:44:08 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 22:50:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 22:51:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 22:51:27 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 22:51:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 22:51:27 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 14 22:53:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 14 22:55:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 22:55:16 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 23:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 23:01:43 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 14 23:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 23:01:43 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 23:07:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 23:07:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 23:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 04031d35-e75a-0623-0a2e-3f8a84f80ab5 (at 10.8.27.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522f48400, cur 1565849439 expire 1565849289 last 1565849212 Aug 14 23:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 04031d35-e75a-0623-0a2e-3f8a84f80ab5 (at 10.8.27.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18866a5400, cur 1565849456 expire 1565849306 last 1565849229 Aug 14 23:10:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 14 23:11:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 23:11:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 23:11:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 23:11:52 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 14 23:17:53 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565849866/real 1565849866] req@ffff8f17024b7500 x1636766041624768/t0(0) o104->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565849873 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 14 23:17:53 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 14 23:18:00 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565849873/real 1565849873] req@ffff8f17024b7500 x1636766041624768/t0(0) o104->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565849880 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 23:18:00 fir-md1-s1 kernel: Lustre: 97639:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 14 23:18:01 fir-md1-s1 kernel: Lustre: 21446:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1e58548900 x1638239482814032/t0(0) o101->6bb1b23c-28f8-153d-8cc1-2ff0115f9167@10.9.106.58@o2ib4:6/0 lens 1792/3288 e 1 to 0 dl 1565849886 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 23:18:09 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565849882/real 1565849882] req@ffff8f2fd1f53c00 x1636766041702608/t0(0) o104->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1565849889 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 14 23:18:09 fir-md1-s1 kernel: Lustre: 23605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 14 23:18:13 fir-md1-s1 kernel: Lustre: 23660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0edaff5700 x1631547070601872/t0(0) o101->9f2ddc86-65fa-8a70-8eea-d37d69d7c71f@10.9.106.64@o2ib4:18/0 lens 1792/3288 e 0 to 0 dl 1565849898 ref 2 fl Interpret:/0/0 rc 0/0 Aug 14 23:18:21 fir-md1-s1 kernel: LustreError: 97639:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.104.13@o2ib4) failed to reply to blocking AST (req@ffff8f17024b7500 x1636766041624768 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2c2066f2c0/0x5d9ee6cdda681984 lrc: 4/0,0 mode: PR/PR res: [0x2c002bfae:0x4589:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.104.13@o2ib4 remote: 0x10b58f6e0dcb70ca expref: 286 pid: 23742 timeout: 4964983 lvb_type: 0 Aug 14 23:18:21 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.104.13@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 14 23:18:21 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.104.13@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f2c2066f2c0/0x5d9ee6cdda681984 lrc: 3/0,0 mode: PR/PR res: [0x2c002bfae:0x4589:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.104.13@o2ib4 remote: 0x10b58f6e0dcb70ca expref: 287 pid: 23742 timeout: 0 lvb_type: 0 Aug 14 23:19:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2b4ee31a-0b6c-fe41-375d-c2f919794d53 (at 10.9.104.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45361df800, cur 1565849947 expire 1565849797 last 1565849720 Aug 14 23:19:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6576db86-576c-58c4-907d-b54174076c6b (at 10.9.104.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251c289400, cur 1565849949 expire 1565849799 last 1565849722 Aug 14 23:19:09 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Aug 14 23:20:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 14 23:20:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 14 23:20:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7bc73a34-3e7e-c2f0-81f2-d0da70be75c4 (at 10.9.108.26@o2ib4) in 192 seconds. I think it's dead, and I am evicting it. exp ffff8f34f27c9800, cur 1565850023 expire 1565849873 last 1565849831 Aug 14 23:20:23 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 23:20:58 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 73d49704-9b98-8e71-10d4-706f06ea75ad (at 10.9.108.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251bba9800, cur 1565850058 expire 1565849908 last 1565849831 Aug 14 23:20:58 fir-md1-s1 kernel: Lustre: Skipped 133 previous similar messages Aug 14 23:21:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 23:21:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 14 23:22:24 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 14 23:22:24 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Aug 14 23:22:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6efc0e4b-1ad3-bb80-daf0-68493389a065 (at 10.9.106.18@o2ib4) reconnecting Aug 14 23:22:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 14 23:32:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 23:32:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 14 23:32:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 23:32:03 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 14 23:32:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 23:32:57 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 14 23:42:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 14 23:42:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 14 23:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 14 23:44:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 14 23:49:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 14 23:49:49 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 14 23:52:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 14 23:52:59 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Aug 14 23:55:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 14 23:55:21 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 00:01:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 00:01:04 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 00:03:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ef5a936a-063b-f6be-8aff-dbcdc36084a8 (at 10.8.17.19@o2ib6) Aug 15 00:03:36 fir-md1-s1 kernel: Lustre: Skipped 238 previous similar messages Aug 15 00:05:37 fir-md1-s1 kernel: Lustre: 23698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 00:05:37 fir-md1-s1 kernel: Lustre: 23698:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 15 00:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 00:05:41 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 15 00:05:42 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 00:05:42 fir-md1-s1 kernel: Lustre: 21073:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Aug 15 00:13:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 00:13:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 15 00:15:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 00:15:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 00:16:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 00:16:23 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 00:16:43 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 15 00:24:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 00:24:10 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 00:28:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 00:28:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 00:29:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 00:29:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 00:33:45 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 00:33:45 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 15 00:35:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 00:35:14 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 00:39:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 00:39:42 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 00:41:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 00:41:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 00:41:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cad9f63d-21ad-16f4-e743-10ad8845b606 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f7eac3c00, cur 1565854912 expire 1565854762 last 1565854685 Aug 15 00:41:52 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Aug 15 00:41:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cad9f63d-21ad-16f4-e743-10ad8845b606 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4420b75c00, cur 1565854914 expire 1565854764 last 1565854687 Aug 15 00:46:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 00:46:08 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 00:49:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 00:49:48 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 00:51:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 00:51:10 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 00:51:36 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 00:52:47 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 15 00:55:55 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565855748/real 1565855748] req@ffff8f159f5f4800 x1636766289521808/t0(0) o104->fir-MDT0002@10.8.20.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565855755 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 15 00:55:55 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Aug 15 00:56:02 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565855755/real 1565855755] req@ffff8f159f5f4800 x1636766289521808/t0(0) o104->fir-MDT0002@10.8.20.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565855762 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 00:56:03 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f306f935400 x1631826214104736/t0(0) o101->efd546f5-bfcd-ae80-6d75-3dec0c2f78a2@10.8.22.21@o2ib6:8/0 lens 576/3264 e 1 to 0 dl 1565855768 ref 2 fl Interpret:/0/0 rc 0/0 Aug 15 00:56:03 fir-md1-s1 kernel: Lustre: 23597:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 15 00:56:04 fir-md1-s1 kernel: Lustre: 97645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1628834800 x1631736914667216/t0(0) o101->08764de7-8940-1784-0694-82c64d21e24f@10.8.20.8@o2ib6:9/0 lens 576/3264 e 1 to 0 dl 1565855769 ref 2 fl Interpret:/0/0 rc 0/0 Aug 15 00:56:04 fir-md1-s1 kernel: Lustre: 97645:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 15 00:56:05 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2f9c90e300 x1631736914667280/t0(0) o101->08764de7-8940-1784-0694-82c64d21e24f@10.8.20.8@o2ib6:10/0 lens 576/3264 e 1 to 0 dl 1565855770 ref 2 fl Interpret:/0/0 rc 0/0 Aug 15 00:56:05 fir-md1-s1 kernel: Lustre: 23754:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Aug 15 00:56:07 fir-md1-s1 kernel: Lustre: 97646:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f30ee255d00 x1631826214105552/t0(0) o101->efd546f5-bfcd-ae80-6d75-3dec0c2f78a2@10.8.22.21@o2ib6:12/0 lens 576/3264 e 1 to 0 dl 1565855772 ref 2 fl Interpret:/0/0 rc 0/0 Aug 15 00:56:07 fir-md1-s1 kernel: Lustre: 97646:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Aug 15 00:56:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 83a06808-9974-632e-f443-2d4f1351f29a (at 10.8.22.21@o2ib6) Aug 15 00:56:09 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 00:56:09 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565855762/real 1565855762] req@ffff8f159f5f4800 x1636766289521808/t0(0) o104->fir-MDT0002@10.8.20.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565855769 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 00:56:23 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565855776/real 1565855776] req@ffff8f159f5f4800 x1636766289521808/t0(0) o104->fir-MDT0002@10.8.20.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1565855783 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 00:56:23 fir-md1-s1 kernel: Lustre: 21434:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 15 00:56:23 fir-md1-s1 kernel: LustreError: 21434:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.35@o2ib6) failed to reply to blocking AST (req@ffff8f159f5f4800 x1636766289521808 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2f761ab3c0/0x5d9ee6ce3dce1204 lrc: 4/0,0 mode: PR/PR res: [0x2c002c2a8:0x795:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x60200400000020 nid: 10.8.20.35@o2ib6 remote: 0x9f13927d16f6f146 expref: 14122 pid: 23746 timeout: 4970865 lvb_type: 0 Aug 15 00:56:23 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.35@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 15 00:56:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.20.35@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2f761ab3c0/0x5d9ee6ce3dce1204 lrc: 3/0,0 mode: PR/PR res: [0x2c002c2a8:0x795:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x60200400000020 nid: 10.8.20.35@o2ib6 remote: 0x9f13927d16f6f146 expref: 14123 pid: 23746 timeout: 0 lvb_type: 0 Aug 15 00:57:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fe523cfb-439a-b198-807e-b8ff69f0446a (at 10.8.20.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae1d6a400, cur 1565855849 expire 1565855699 last 1565855622 Aug 15 00:57:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 00:59:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 00:59:52 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 01:01:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 01:01:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 01:06:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 01:06:19 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 15 01:09:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 01:09:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 01:12:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 01:12:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 01:12:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 01:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 01:16:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 01:18:45 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 01:20:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 01:20:09 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 01:22:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 01:22:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 01:26:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 01:26:29 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 15 01:31:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 01:31:10 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 01:32:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 01:32:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 01:37:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 01:37:28 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 01:41:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 01:41:19 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 01:43:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 01:43:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 01:48:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 01:48:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 01:51:58 fir-md1-s1 kernel: Lustre: 23644:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 01:52:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 01:52:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 01:59:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 01:59:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 02:00:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 02:00:35 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 02:03:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f1ea707a-d4f0-e28c-520e-dedc65041d4b (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1774c7a400, cur 1565859809 expire 1565859659 last 1565859582 Aug 15 02:03:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 02:04:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 02:04:15 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 02:09:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 02:09:20 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 02:11:14 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 02:11:14 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 02:13:36 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:13:36 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Aug 15 02:14:08 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:14:27 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:14:31 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:14:31 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 15 02:14:42 fir-md1-s1 kernel: Lustre: 23618:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:14:51 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:14:51 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 15 02:15:13 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:15:13 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 15 02:15:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 02:15:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 02:15:45 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 02:15:45 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Aug 15 02:20:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 02:20:50 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 02:23:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 02:23:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 02:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 02:26:41 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 02:31:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 02:31:15 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 02:37:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 02:37:32 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 02:38:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 02:38:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 02:40:41 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:40:41 fir-md1-s1 kernel: Lustre: 23558:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages Aug 15 02:41:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 02:41:48 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 02:48:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 02:48:45 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 02:48:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 02:48:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 02:50:26 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 02:50:26 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Aug 15 02:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 02:52:52 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 02:58:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 02:58:48 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 03:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 03:02:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 03:04:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 03:05:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 03:06:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 03:06:50 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 03:08:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 03:08:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 03:13:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 03:13:15 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 03:14:29 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 03:14:29 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 20 previous similar messages Aug 15 03:15:27 fir-md1-s1 kernel: Lustre: 10504:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 03:15:27 fir-md1-s1 kernel: Lustre: 10504:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 15 03:15:35 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 03:15:35 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages Aug 15 03:18:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 03:18:55 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 03:19:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 03:19:23 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 03:21:53 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 03:21:53 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Aug 15 03:22:36 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 03:22:36 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 15 03:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 03:23:19 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 03:25:12 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 03:25:12 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Aug 15 03:29:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 03:29:00 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 03:29:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 03:29:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 03:33:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 03:33:40 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 03:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 03:39:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 03:41:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 03:41:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 03:44:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 03:44:12 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 03:50:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 03:50:11 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 03:54:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 03:54:21 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 03:54:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 03:54:21 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 04:01:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 04:01:10 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 04:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f7046cb3-2424-716e-2c65-ff120c380048 (at 10.9.108.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec7b78800, cur 1565867032 expire 1565866882 last 1565866805 Aug 15 04:03:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 04:04:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f7046cb3-2424-716e-2c65-ff120c380048 (at 10.9.108.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0841995800, cur 1565867047 expire 1565866897 last 1565866820 Aug 15 04:04:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 04:04:24 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 04:04:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 04:04:24 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 04:08:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 04:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 04:11:53 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 04:15:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 04:15:06 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 04:17:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 04:17:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 04:19:53 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 04:19:53 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 15 04:22:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 04:22:26 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 04:23:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a93aa789-84da-e96b-b86c-3842f65e9124 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1926688400, cur 1565868185 expire 1565868035 last 1565867958 Aug 15 04:23:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 04:25:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 04:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 04:26:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 15 04:30:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 04:30:46 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 04:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 04:34:00 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 15 04:38:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 04:38:04 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 04:39:50 fir-md1-s1 kernel: Lustre: 23701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 04:43:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 04:43:29 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 04:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 04:44:24 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 04:49:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 04:49:27 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 04:54:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 04:54:03 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 04:54:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 04:54:27 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 04:59:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 04:59:30 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 05:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 05:05:00 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 05:05:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 05:05:55 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 05:10:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 05:10:03 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 05:13:32 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 05:13:32 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Aug 15 05:15:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 05:15:59 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 05:16:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 05:16:01 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 05:20:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 05:20:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 05:24:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 05:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 05:26:06 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 05:27:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 05:27:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 05:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 05:30:37 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 05:36:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 05:36:33 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 05:40:19 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 05:40:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 05:40:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 05:40:40 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 05:46:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 05:46:41 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 05:50:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 05:50:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 05:50:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 05:50:53 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 05:56:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 05:56:49 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 06:01:04 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 06:01:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 06:01:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 06:01:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 06:07:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 06:07:28 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 06:11:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 06:11:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 06:11:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 06:11:24 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 06:12:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 06:17:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 06:17:29 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 06:22:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 06:22:18 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 06:22:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 06:22:18 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 06:28:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 06:28:55 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 06:29:29 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 06:29:29 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages Aug 15 06:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 06:33:03 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 06:35:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 06:35:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 06:39:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 06:39:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 06:44:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 06:44:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 06:47:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 06:47:00 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 06:49:38 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565876971/real 1565876971] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565876978 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 15 06:49:45 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565876978/real 1565876978] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565876985 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:49:46 fir-md1-s1 kernel: Lustre: 23599:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0726e38f00 x1637995456759376/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1565876991 ref 2 fl Interpret:/0/0 rc 0/0 Aug 15 06:49:46 fir-md1-s1 kernel: Lustre: 23599:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Aug 15 06:49:52 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565876985/real 1565876985] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565876992 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:49:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 96f77fe0-d0c2-629d-bb62-dcf685e7e47d (at 10.9.0.61@o2ib4) reconnecting Aug 15 06:49:52 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 06:50:06 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565876999/real 1565876999] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565877006 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:50:06 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 15 06:50:27 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565877020/real 1565877020] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565877027 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:50:27 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 15 06:51:02 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565877055/real 1565877055] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565877062 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:51:02 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 15 06:52:12 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1565877125/real 1565877125] req@ffff8f13c32a3000 x1636766902885344/t0(0) o106->fir-MDT0002@10.8.28.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1565877132 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 15 06:52:12 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 15 06:52:51 fir-md1-s1 kernel: LNet: Service thread pid 23687 was inactive for 200.16s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 15 06:52:51 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Aug 15 06:52:51 fir-md1-s1 kernel: Pid: 23687, comm: mdt00_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 15 06:52:51 fir-md1-s1 kernel: Call Trace: Aug 15 06:52:51 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 15 06:52:51 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 15 06:52:51 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 15 06:52:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 15 06:52:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 15 06:52:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 15 06:52:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 15 06:52:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 15 06:52:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1565877171.23687 Aug 15 06:53:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3c321701-2950-e3a0-e425-740898be58b7 (at 10.8.28.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2adcef9c00, cur 1565877182 expire 1565877032 last 1565876955 Aug 15 06:53:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 06:53:02 fir-md1-s1 kernel: Lustre: 23687:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (210:1s); client may timeout. req@ffff8f0726e38f00 x1637995456759376/t0(0) o101->96f77fe0-d0c2-629d-bb62-dcf685e7e47d@10.9.0.61@o2ib4:21/0 lens 480/536 e 1 to 0 dl 1565877181 ref 1 fl Complete:/0/0 rc 301/301 Aug 15 06:53:02 fir-md1-s1 kernel: LNet: Service thread pid 23687 completed after 210.96s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 15 06:53:02 fir-md1-s1 kernel: LNet: Skipped 60 previous similar messages Aug 15 06:55:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 06:55:04 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 15 06:59:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 06:59:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 07:00:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 07:00:35 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 07:05:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 07:05:09 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 15 07:10:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 07:10:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 07:13:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 07:13:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 07:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 07:15:35 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 07:21:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 07:21:37 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 07:23:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 07:23:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 07:25:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 07:25:39 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 07:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 07:34:25 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 07:36:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 07:36:15 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 07:36:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 07:36:15 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 07:44:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 07:44:28 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 15 07:46:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 07:46:22 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 07:47:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 07:47:16 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 15 07:53:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 07:54:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 07:54:58 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 07:56:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 07:56:25 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 07:57:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 07:57:49 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 08:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 08:05:00 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 08:06:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 08:06:58 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 08:08:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 08:08:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 08:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 08:17:27 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 08:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 08:17:27 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 08:19:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 08:19:47 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 08:27:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 08:27:31 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 08:27:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 08:27:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 08:29:53 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 08:29:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 08:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 08:38:05 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 08:38:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 08:38:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 08:39:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 08:39:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 08:43:39 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 08:43:39 fir-md1-s1 kernel: Lustre: 23649:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 15 08:48:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 08:48:37 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 08:48:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 08:48:37 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 08:50:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 08:50:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 08:54:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 08:55:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 08:56:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 08:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 08:59:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 08:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 08:59:28 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 09:00:57 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 09:00:57 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 09:10:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 09:10:01 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 09:10:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 09:10:33 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 09:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2ce97f7-c32a-3cbc-d332-76035bc3336e (at 10.8.8.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e6d4dc400, cur 1565885673 expire 1565885523 last 1565885446 Aug 15 09:14:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 09:16:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 09:16:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 09:20:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 09:20:09 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 09:20:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 09:20:37 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 09:27:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 09:27:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 09:30:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 09:30:41 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 15 09:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 09:31:08 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 09:38:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 09:38:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 09:40:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 09:40:47 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 15 09:40:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6fcbfae4-beb4-a736-0646-7b8358a43b96 (at 10.9.112.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f40754c0800, cur 1565887257 expire 1565887107 last 1565887030 Aug 15 09:40:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 09:41:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 09:41:38 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 09:46:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 65c7cbb7-edd7-61f5-c144-1ffbb9efedd7 (at 10.8.1.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19fc3f5400, cur 1565887582 expire 1565887432 last 1565887355 Aug 15 09:46:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 09:49:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 09:49:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 09:50:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 09:50:48 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 09:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 09:51:45 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 10:00:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 10:00:25 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 10:00:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 10:00:50 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 10:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 10:01:47 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 10:11:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 10:11:24 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Aug 15 10:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 10:14:08 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 10:15:56 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 10:15:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 10:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 10:21:52 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 10:25:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 10:26:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 10:26:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 10:26:29 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 10:26:29 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 15 10:26:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 10:32:57 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 15 10:32:58 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 15 10:33:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 19effcd6-8030-8ae1-d9d6-24266f7c8d3c (at 10.8.27.35@o2ib6) Aug 15 10:33:04 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 10:34:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 15 10:37:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 10:37:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 10:37:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 10:37:59 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 10:44:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 10:44:45 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 10:48:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 10:48:36 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 10:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 10:48:46 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 10:54:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 10:54:57 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 15 10:58:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ff7c780-d98e-a70d-dac1-f6d1d9dcb050 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f392a2b7400, cur 1565891903 expire 1565891753 last 1565891676 Aug 15 10:58:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 11:00:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 11:00:57 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 11:02:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 11:02:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 15 11:05:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 11:05:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 11:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 11:11:43 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 11:13:17 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 11:13:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 11:15:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 11:15:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 15 11:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 11:21:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 11:25:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 11:25:59 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 11:30:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 11:30:33 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 11:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 11:32:07 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 11:36:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 11:36:03 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 15 11:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 11:42:11 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 11:42:38 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 11:42:38 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 11:46:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 11:46:11 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 11:52:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 11:52:20 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 11:56:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 11:56:41 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 11:58:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 11:58:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 12:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 12:02:37 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 12:06:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 12:06:59 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 12:08:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 12:08:48 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 12:12:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 12:12:43 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 12:19:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 12:19:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 12:19:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 12:19:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 12:22:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 12:22:59 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 15 12:29:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 12:29:14 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 12:30:23 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 12:30:23 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 12:33:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 12:33:34 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 12:39:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 12:39:59 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 12:42:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 12:42:16 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 12:43:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 12:43:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 12:43:56 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 12:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 10947b9b-0f5a-c39c-5999-172515a96889 (at 10.8.8.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520d18800, cur 1565898457 expire 1565898307 last 1565898230 Aug 15 12:47:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 12:47:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e1a59269-1e35-4f3a-9f6e-6757b3fc4759 (at 10.8.8.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f350dfd0400, cur 1565898469 expire 1565898319 last 1565898242 Aug 15 12:47:49 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 12:50:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 12:50:03 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Aug 15 12:52:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 12:52:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 12:53:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.10.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 15 12:53:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 12:53:58 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 12:57:11 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 15 13:00:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 13:00:08 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 13:01:34 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 15 13:01:34 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Aug 15 13:04:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 13:04:13 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 15 13:04:42 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 13:04:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 13:10:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 13:10:55 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 13:13:56 fir-md1-s1 kernel: Lustre: 23567:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 13:14:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 13:14:39 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 13:16:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 13:16:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 13:21:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 13:21:40 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 13:24:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 13:24:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 13:27:59 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 13:27:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 13:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 13:31:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 13:34:39 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 13:34:39 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 15 13:35:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 13:35:22 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 13:40:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 13:40:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 13:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 13:42:08 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 13:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 13:45:23 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 13:50:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 13:50:48 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 13:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 13:52:10 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 13:56:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 13:56:43 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 14:02:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 14:02:12 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 14:02:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 14:02:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 14:07:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 14:07:02 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 14:12:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 14:12:15 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 14:16:39 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 14:16:39 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 14:17:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 14:17:06 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 15 14:19:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5c524413-3d60-74d7-2ea6-e33094f815fc (at 10.8.8.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2522eb5000, cur 1565903960 expire 1565903810 last 1565903733 Aug 15 14:19:20 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 14:22:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 14:22:32 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 14:27:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 14:27:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 14:27:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 14:27:45 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 14:27:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a471d462-4d6b-3154-5b44-8ba012019709 (at 10.8.16.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31bc1b5c00, cur 1565904466 expire 1565904316 last 1565904239 Aug 15 14:27:46 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Aug 15 14:32:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 14:32:34 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 14:32:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520981800, cur 1565904767 expire 1565904617 last 1565904540 Aug 15 14:32:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 14:38:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3d29c3e1-3431-278f-589f-781a7b3c90ae (at 10.8.16.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f213d1a4800, cur 1565905090 expire 1565904940 last 1565904863 Aug 15 14:38:10 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 14:38:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 14:38:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 14:39:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 14:39:40 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 14:43:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 14:43:32 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 15 14:50:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 14:50:26 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 14:50:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 14:50:37 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 15 14:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 14:54:05 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 15:00:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 15:00:43 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 15:02:06 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 15:02:06 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 15:02:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4723ec0a-718a-1580-f873-3aa51d5d57b9 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f374fec7800, cur 1565906531 expire 1565906381 last 1565906304 Aug 15 15:02:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 15 15:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4723ec0a-718a-1580-f873-3aa51d5d57b9 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20314bac00, cur 1565906532 expire 1565906382 last 1565906305 Aug 15 15:02:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 15:04:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 15:04:50 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 15:11:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 15:11:15 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 15:13:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 15:13:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 15:14:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 15:14:56 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 15 15:21:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 15:21:39 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 15 15:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 15:25:04 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 15:26:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 15:26:15 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 15:31:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 15:31:53 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Aug 15 15:35:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e4809e9d-cd93-fce4-b050-67f299926009 (at 10.9.101.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4510302400, cur 1565908506 expire 1565908356 last 1565908279 Aug 15 15:35:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 15:35:55 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 15 15:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 61cb72b3-3a74-fe53-6d72-265066c3dd24 (at 10.9.105.30@o2ib4) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8f2530be5000, cur 1565908582 expire 1565908432 last 1565908357 Aug 15 15:36:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 15:36:24 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 67c36890-bb5c-df64-af0e-2b910553363b (at 10.9.105.30@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d2226400, cur 1565908584 expire 1565908434 last 1565908357 Aug 15 15:37:44 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.12@o2ib6, removing former export from same NID Aug 15 15:37:44 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 15 15:43:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 64093eed-1899-7457-95e6-ff7526581ffb (at 10.8.10.21@o2ib6) reconnecting Aug 15 15:43:14 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 15:45:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 25fbf076-bd59-1bcf-bd3c-d51e0d40cd71 (at 10.8.28.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e83e88800, cur 1565909105 expire 1565908955 last 1565908878 Aug 15 15:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 15:46:23 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 15 15:49:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 15:49:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 15:53:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 15:53:41 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Aug 15 15:56:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 15 15:56:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 15 15:59:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.21@o2ib6, removing former export from same NID Aug 15 15:59:22 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 15 16:03:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bf3478cc-569b-5c14-1a71-20ca1e1f08aa (at 10.8.12.12@o2ib6) reconnecting Aug 15 16:03:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 16:07:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1b7aecc8-a455-dad2-efc3-59dffd90c0d4 (at 10.8.12.12@o2ib6) Aug 15 16:07:01 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Aug 15 16:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9a45d422-ab0d-40e3-2c86-a284b38f93e2 (at 10.8.12.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2501a9ec00, cur 1565910718 expire 1565910568 last 1565910491 Aug 15 16:11:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 16:12:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 63331ef8-e7f6-019b-65ae-b4aad7ec4d2c (at 10.8.14.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1771036400, cur 1565910722 expire 1565910572 last 1565910495 Aug 15 16:12:02 fir-md1-s1 kernel: Lustre: Skipped 127 previous similar messages Aug 15 16:26:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e6770383-b71d-26bd-2ffa-8df05e7f3814 (at 10.8.9.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2538a09000, cur 1565911581 expire 1565911431 last 1565911354 Aug 15 16:26:21 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Aug 15 16:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 08221c4d-680b-0eb0-dfa4-ec6a7d978740 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e8d13fc00, cur 1565911592 expire 1565911442 last 1565911365 Aug 15 16:26:32 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 15 16:36:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 15 16:36:41 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 15 16:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5de47ea0-5b83-faaf-1000-94a17b3b724d (at 10.8.24.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3112d74400, cur 1565913358 expire 1565913208 last 1565913131 Aug 15 16:55:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 15 17:15:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f37bcae1-5968-c42e-e3ff-512bc53b2aa0 (at 10.8.19.5@o2ib6) Aug 15 17:15:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 17:18:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 826fbeb7-54e9-5127-860e-c32891bc78a7 (at 10.9.107.9@o2ib4) Aug 15 17:18:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 17:19:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2a623c4c-4373-ca2f-1d79-a53d0df03e0e (at 10.9.107.13@o2ib4) Aug 15 17:19:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 17:25:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 08f3673c-126c-901e-0cce-a4f9fb4302d2 (at 10.8.24.31@o2ib6) Aug 15 17:25:36 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 15 17:33:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.20@o2ib4) Aug 15 17:33:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 19:26:38 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:26:38 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 15 19:26:38 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:26:38 fir-md1-s1 kernel: Lustre: 23598:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages Aug 15 19:26:43 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:26:43 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Aug 15 19:26:49 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:26:49 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 48 previous similar messages Aug 15 19:26:54 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:26:54 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 35 previous similar messages Aug 15 19:27:17 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:27:17 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages Aug 15 19:39:19 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:39:19 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 46 previous similar messages Aug 15 19:39:29 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:39:29 fir-md1-s1 kernel: Lustre: 10505:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 34 previous similar messages Aug 15 19:39:37 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:39:37 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 39 previous similar messages Aug 15 19:39:47 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:39:47 fir-md1-s1 kernel: Lustre: 23562:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 69 previous similar messages Aug 15 19:43:35 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 19:43:35 fir-md1-s1 kernel: Lustre: 23575:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 52 previous similar messages Aug 15 20:06:37 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:06:37 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 119 previous similar messages Aug 15 20:06:45 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:06:45 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 18 previous similar messages Aug 15 20:06:54 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:06:54 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 47 previous similar messages Aug 15 20:07:26 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:07:26 fir-md1-s1 kernel: Lustre: 20458:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 72 previous similar messages Aug 15 20:10:46 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:10:46 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 74 previous similar messages Aug 15 20:16:26 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 15 20:16:26 fir-md1-s1 kernel: Lustre: 23612:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 227 previous similar messages Aug 15 20:45:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 76975a67-f71d-74d8-834c-f4abd2ccf661 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f084b9800, cur 1565927131 expire 1565926981 last 1565926904 Aug 15 20:45:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 15 20:46:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 15 20:46:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 21:35:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 24ab177c-fa53-1ad7-a4b8-75ee3a88aec0 (at 10.8.8.24@o2ib6) Aug 15 21:35:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 15 23:44:01 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 15 23:52:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 83a01ec4-b761-2db2-1866-20fa1191b2b5 (at 10.8.8.21@o2ib6) Aug 15 23:52:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 01:27:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8ffea0b0-16ef-b612-6672-2f72cade106a (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24efe71000, cur 1565944025 expire 1565943875 last 1565943798 Aug 16 01:27:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 01:27:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Aug 16 01:27:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 02:13:53 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 16 02:13:53 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 60 previous similar messages Aug 16 02:45:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 89b1cce5-a5f8-09b9-5965-85454348ab62 (at 10.8.21.23@o2ib6) Aug 16 02:45:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 03:11:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 746b3d8e-c221-65e3-9e0b-3d48071d79a2 (at 10.9.0.81@o2ib4) reconnecting Aug 16 03:11:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 16 03:11:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) Aug 16 03:11:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 05:29:38 fir-md1-s1 kernel: Lustre: 23698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 16 06:14:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e8acca8-b792-7251-99a2-2c096d3acabd (at 10.9.101.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ede647c00, cur 1565961274 expire 1565961124 last 1565961047 Aug 16 06:14:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 06:14:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e8acca8-b792-7251-99a2-2c096d3acabd (at 10.9.101.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f375c09b400, cur 1565961285 expire 1565961135 last 1565961058 Aug 16 06:14:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 16 06:14:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b9ab8d1c-186b-e91c-f7e2-2b46a5aaa7f7 (at 10.9.101.18@o2ib4) Aug 16 07:04:53 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 16 07:19:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2ac4397-2e51-a615-ba22-d10920eaecbc (at 10.9.116.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0b40697800, cur 1565965194 expire 1565965044 last 1565964967 Aug 16 07:20:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f2ac4397-2e51-a615-ba22-d10920eaecbc (at 10.9.116.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15cada8800, cur 1565965207 expire 1565965057 last 1565964980 Aug 16 07:20:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 16 08:27:46 fir-md1-s1 kernel: Lustre: 10308:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 16 08:48:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1543573c-a149-4abc-6397-9ebe639ad4e4 (at 10.8.8.37@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2423b19c00, cur 1565970492 expire 1565970342 last 1565970265 Aug 16 09:13:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 12040153-fd2d-6649-96ec-d96366b26c43 (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20d501d800, cur 1565972016 expire 1565971866 last 1565971789 Aug 16 09:13:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 09:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1c5528cf-e327-f18f-7a2a-f18376f36b76 (at 10.8.19.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2509e72c00, cur 1565972480 expire 1565972330 last 1565972253 Aug 16 09:21:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 16 09:29:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 219fc99a-1206-4f42-4b15-ea15722edea9 (at 10.9.116.7@o2ib4) Aug 16 09:29:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 09:43:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f37bcae1-5968-c42e-e3ff-512bc53b2aa0 (at 10.8.19.5@o2ib6) Aug 16 09:43:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 10:01:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 28404991-8d4b-34f2-2c6d-5eaed62a4d2d (at 10.9.107.2@o2ib4) Aug 16 10:01:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 11:23:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3dfded4-9107-609c-ae57-2966b58f71e4 (at 10.8.9.1@o2ib6) Aug 16 11:23:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 11:24:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1f29c7a2-d2d3-0a98-27b0-578e87d088ab (at 10.8.9.2@o2ib6) Aug 16 11:24:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 12:08:42 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 16 12:08:42 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 16 12:50:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7370c1e7-1463-138f-49f5-2aacbf397d6b (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1672aa8000, cur 1565985033 expire 1565984883 last 1565984806 Aug 16 12:50:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 12:51:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 290d6608-03d4-0bb1-48e8-288d4a314d54 (at 10.8.28.9@o2ib6) Aug 16 12:51:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 14:10:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e4778e0-faf4-6b63-18f0-b58b9b1b1ccf (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2db2397c00, cur 1565989837 expire 1565989687 last 1565989610 Aug 16 14:10:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 14:10:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e4778e0-faf4-6b63-18f0-b58b9b1b1ccf (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2db2391c00, cur 1565989849 expire 1565989699 last 1565989622 Aug 16 14:10:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 16 14:11:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 290d6608-03d4-0bb1-48e8-288d4a314d54 (at 10.8.28.9@o2ib6) Aug 16 14:11:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 14:59:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 99f7d89b-ef56-ef37-e49d-05a66366121f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1fedd46400, cur 1565992751 expire 1565992601 last 1565992524 Aug 16 15:00:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 15:00:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 15:33:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8da0de24-d247-1a16-14e0-f8e47d1b1dc5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c647c2c00, cur 1565994811 expire 1565994661 last 1565994584 Aug 16 15:33:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 15:33:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 15:33:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 16:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d52e6204-86c0-9cb9-b99f-f2d4fe2ef9ff (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1768df4c00, cur 1565997193 expire 1565997043 last 1565996966 Aug 16 16:13:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 16:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d52e6204-86c0-9cb9-b99f-f2d4fe2ef9ff (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f241b69b000, cur 1565997205 expire 1565997055 last 1565996978 Aug 16 16:13:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 16 16:13:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 290d6608-03d4-0bb1-48e8-288d4a314d54 (at 10.8.28.9@o2ib6) Aug 16 16:13:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 18:14:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c7317de1-dc12-7eeb-b2c7-8bda04dd1f78 (at 10.8.7.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22f8a1d000, cur 1566004478 expire 1566004328 last 1566004251 Aug 16 18:14:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c7317de1-dc12-7eeb-b2c7-8bda04dd1f78 (at 10.8.7.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f8db04400, cur 1566004492 expire 1566004342 last 1566004265 Aug 16 19:46:19 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566009972/real 1566009972] req@ffff8f06e73e8900 x1636771179761152/t0(0) o106->fir-MDT0000@10.8.7.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1566009979 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 16 19:46:19 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 16 19:46:27 fir-md1-s1 kernel: Lustre: 23578:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b91a08300 x1637244235165536/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:2/0 lens 480/568 e 1 to 0 dl 1566009992 ref 2 fl Interpret:/0/0 rc 0/0 Aug 16 19:46:40 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566009993/real 1566009993] req@ffff8f06e73e8900 x1636771179761152/t0(0) o106->fir-MDT0000@10.8.7.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1566010000 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 16 19:46:40 fir-md1-s1 kernel: Lustre: 21370:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 16 19:46:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 03459ba8-d420-8fa0-2983-fdf11ef807a0 (at 10.8.7.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22d36b5c00, cur 1566010004 expire 1566009854 last 1566009777 Aug 16 19:46:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 16 19:46:44 fir-md1-s1 kernel: Lustre: 21370:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:12s); client may timeout. req@ffff8f0b91a08300 x1637244235165536/t0(0) o101->6ee172d9-72a9-7fa2-230d-3850214207fa@10.0.10.3@o2ib7:2/0 lens 480/536 e 1 to 0 dl 1566009992 ref 1 fl Complete:/0/0 rc 301/301 Aug 16 20:08:31 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566011304/real 1566011304] req@ffff8f15afd58300 x1636771183970400/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566011311 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 16 20:08:38 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566011311/real 1566011311] req@ffff8f15afd58300 x1636771183970400/t0(0) o106->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566011318 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 16 20:08:50 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f17e1e22100 x1631845480215584/t0(0) o101->6b214d7a-e97a-9cf3-d622-e8901c5e5f49@10.9.107.8@o2ib4:24/0 lens 480/568 e 0 to 0 dl 1566011334 ref 2 fl Interpret:/0/0 rc 0/0 Aug 16 20:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9cde7680-2916-ed5a-8579-17827a7e41ee (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d24a6fc00, cur 1566011331 expire 1566011181 last 1566011104 Aug 16 20:08:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 20:11:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d8cc7b58-ee01-5501-ca65-c659f4724147 (at 10.9.106.54@o2ib4) Aug 16 20:11:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 20:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b104333c-5261-8329-169b-53042299a801 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f269789dc00, cur 1566011723 expire 1566011573 last 1566011496 Aug 16 20:15:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 20:18:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 20:18:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 20:46:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6bcb842f-f173-8050-534f-feb9ea44f002 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bc78ea800, cur 1566013564 expire 1566013414 last 1566013337 Aug 16 20:46:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 20:47:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 20:47:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 21:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a9c7cd03-559d-66de-0293-0c51c8e3f2e6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee8901400, cur 1566015597 expire 1566015447 last 1566015370 Aug 16 21:19:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 21:23:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 21:23:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 21:46:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b0bdaecf-c143-4d58-457e-674efd8c2625 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b865c9000, cur 1566017164 expire 1566017014 last 1566016937 Aug 16 21:46:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 21:48:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 21:48:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:09:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a1798554-e31e-5195-85af-067aa890d4ae (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c38556c00, cur 1566018567 expire 1566018417 last 1566018340 Aug 16 22:09:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:12:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 22:12:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a660564d-3ab1-470f-9a32-0c505099c046 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f8ca43400, cur 1566019764 expire 1566019614 last 1566019537 Aug 16 22:29:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:29:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 22:29:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:51:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07566650-9635-6356-e1b3-de62032d0f3c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1add470000, cur 1566021101 expire 1566020951 last 1566020874 Aug 16 22:51:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 22:55:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 22:55:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:01:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bedf2c2c-ae4a-9548-c756-50409f639ac7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f218b4ff800, cur 1566021664 expire 1566021514 last 1566021437 Aug 16 23:01:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:03:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 23:03:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:09:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 72964713-3837-d665-3f38-a860e6eae48d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a8b353400, cur 1566022140 expire 1566021990 last 1566021913 Aug 16 23:09:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:11:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 23:11:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:13:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 23dbfbee-8f3b-27e7-f711-fd69cc641360 (at 10.9.115.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22886bd800, cur 1566022404 expire 1566022254 last 1566022177 Aug 16 23:13:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:30:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8a1b2e44-89d1-8bf7-2029-4a3c4c531ee0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4285a59400, cur 1566023432 expire 1566023282 last 1566023205 Aug 16 23:30:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 16 23:35:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 16 23:35:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 409e34d7-8d0a-aa0a-b220-c9d560285527 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3488d9e400, cur 1566025211 expire 1566025061 last 1566024984 Aug 17 00:00:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:02:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:02:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:09:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 36a48699-1251-585d-ecca-594068d629f1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f187cdd1000, cur 1566025746 expire 1566025596 last 1566025519 Aug 17 00:09:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:12:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:12:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:19:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2717122a-9287-2cb8-2bbf-e9e8572d18aa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f178d1a5400, cur 1566026389 expire 1566026239 last 1566026162 Aug 17 00:19:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:21:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:21:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:44:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e63cf470-c30c-1326-2b5f-5eca92e4ef98 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1eaef9d800, cur 1566027845 expire 1566027695 last 1566027618 Aug 17 00:44:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:44:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:44:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:50:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e2d52a3-ae3f-cb80-e2c9-dd797e0a0a11 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1afa174000, cur 1566028235 expire 1566028085 last 1566028008 Aug 17 00:50:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:50:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:50:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:56:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ae9796d7-54df-3839-300e-d8288a79d327 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36992c0000, cur 1566028600 expire 1566028450 last 1566028373 Aug 17 00:56:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 00:58:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 00:58:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 01:22:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e6d105a-65f6-95bf-d54a-cccc26e57e50 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2801df5c00, cur 1566030169 expire 1566030019 last 1566029942 Aug 17 01:22:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 01:26:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 01:26:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 01:43:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a9a793cc-c69c-61ae-7937-c3ff51d91cc7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1617c8a000, cur 1566031407 expire 1566031257 last 1566031180 Aug 17 01:43:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 01:44:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9ce2fd1-c890-b482-d659-13e68f7b5529 (at 10.8.27.2@o2ib6) Aug 17 01:44:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 01:46:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 01:46:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:01:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 022b299c-8b59-745e-a5c2-4926c98386d6 (at 10.8.17.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f195fcf9c00, cur 1566032500 expire 1566032350 last 1566032273 Aug 17 02:01:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 17 02:02:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8257ad81-12d5-f269-3c44-478c2a180d99 (at 10.8.17.1@o2ib6) Aug 17 02:02:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:02:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8da54ad3-525c-fef1-a0bd-79d564e270b9 (at 10.8.26.4@o2ib6) in 160 seconds. I think it's dead, and I am evicting it. exp ffff8f2fd1f42800, cur 1566032576 expire 1566032426 last 1566032416 Aug 17 02:02:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:04:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e4d85c2b-5a4c-48d2-8577-e3ced4965a0b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f364737ac00, cur 1566032643 expire 1566032493 last 1566032416 Aug 17 02:04:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 02:06:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 02:06:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:13:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a1a6c362-dd17-f666-2a6e-1e6b400980b6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d85eb8400, cur 1566033204 expire 1566033054 last 1566032977 Aug 17 02:15:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 02:15:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 512cfbd5-3802-e324-1ee0-caaa7b3f6098 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f38c0661000, cur 1566034539 expire 1566034389 last 1566034312 Aug 17 02:35:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:38:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 02:38:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f305d9a6-ebd6-19e4-e844-71be972ef9a5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2179999800, cur 1566035092 expire 1566034942 last 1566034865 Aug 17 02:44:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:47:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 02:47:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:52:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3b4afe9d-8c16-b025-8ad5-53dc35cc705d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ff436f000, cur 1566035568 expire 1566035418 last 1566035341 Aug 17 02:52:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:54:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 02:54:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 02:54:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 17 02:54:27 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 17 02:54:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 03:00:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 79418964-e865-6342-ed1d-836272b9b9d3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f291b127800, cur 1566036048 expire 1566035898 last 1566035821 Aug 17 03:00:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:04:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 03:04:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c5a77c2f-a67d-8c0a-352e-30b122245365 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f21e7701000, cur 1566036651 expire 1566036501 last 1566036424 Aug 17 03:10:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:15:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 03:15:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:34:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 39fd7a94-ae0e-614d-8d1c-1cd8cd593352 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251ffa8400, cur 1566038066 expire 1566037916 last 1566037839 Aug 17 03:34:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:36:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 03:36:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 03:56:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fa794896-b19c-e46b-cad6-c1c1856934f3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30b405f400, cur 1566039371 expire 1566039221 last 1566039144 Aug 17 03:56:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:00:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 04:00:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:25:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1ebd5829-b9ec-4542-f3ae-3c7a17474d14 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1cf547cc00, cur 1566041144 expire 1566040994 last 1566040917 Aug 17 04:25:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:29:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 04:29:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0fc08ecd-e1d8-0840-17ce-2d761707d850 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f168e1b3400, cur 1566041808 expire 1566041658 last 1566041581 Aug 17 04:36:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:41:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 04:41:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 04:59:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dfdb8b96-15f6-b98e-e465-b4da32136709 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33259ea400, cur 1566043150 expire 1566043000 last 1566042923 Aug 17 04:59:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:00:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:00:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 05:00:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 17 05:00:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:00:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:03:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:28:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9b855e1-cc74-6783-dc77-fb644330ca60 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3121184000, cur 1566044909 expire 1566044759 last 1566044682 Aug 17 05:28:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:30:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:30:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:36:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1b5f8aad-5226-eace-02d1-d18950772760 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ae422bc00, cur 1566045410 expire 1566045260 last 1566045183 Aug 17 05:36:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:40:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:40:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:50:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76ca0f04-1cc6-78b4-c487-7ef752d37def (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a83bb3800, cur 1566046258 expire 1566046108 last 1566046031 Aug 17 05:50:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 05:52:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 05:52:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 06:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client af38eaac-3932-b76a-9374-3c37705d939d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d1c0ba000, cur 1566048329 expire 1566048179 last 1566048102 Aug 17 06:25:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 06:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client af38eaac-3932-b76a-9374-3c37705d939d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2948020c00, cur 1566048335 expire 1566048185 last 1566048108 Aug 17 06:25:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 06:29:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 06:29:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 06:56:35 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050188/real 1566050188] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050195 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 17 06:56:35 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 17 06:56:42 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050195/real 1566050195] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050202 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:56:44 fir-md1-s1 kernel: Lustre: 20553:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f38c5289500 x1636451227305168/t0(0) o101->396c953b-51d1-e614-261e-52aee9dc8ef2@10.9.105.27@o2ib4:18/0 lens 1792/3288 e 1 to 0 dl 1566050208 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 06:56:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 396c953b-51d1-e614-261e-52aee9dc8ef2 (at 10.9.105.27@o2ib4) reconnecting Aug 17 06:56:50 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050202/real 1566050202] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050209 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:56:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dce245ee-1721-1fa3-f0f5-8ef6b7994bca (at 10.9.105.27@o2ib4) Aug 17 06:56:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 06:56:56 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050209/real 1566050209] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050216 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:57:11 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050223/real 1566050223] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050230 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:57:11 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 17 06:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 396c953b-51d1-e614-261e-52aee9dc8ef2 (at 10.9.105.27@o2ib4) reconnecting Aug 17 06:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dce245ee-1721-1fa3-f0f5-8ef6b7994bca (at 10.9.105.27@o2ib4) Aug 17 06:57:32 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050245/real 1566050245] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050252 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:57:32 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 17 06:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 396c953b-51d1-e614-261e-52aee9dc8ef2 (at 10.9.105.27@o2ib4) reconnecting Aug 17 06:57:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dce245ee-1721-1fa3-f0f5-8ef6b7994bca (at 10.9.105.27@o2ib4) Aug 17 06:57:37 fir-md1-s1 kernel: Lustre: 21418:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b5569e300 x1640424876482656/t0(0) o101->b05916b7-111a-93c6-b801-46b72244d611@10.9.108.68@o2ib4:12/0 lens 576/3264 e 1 to 0 dl 1566050262 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 06:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 121f1ddf-ec4f-2be1-7af2-75a40f99121e (at 10.9.108.68@o2ib4) Aug 17 06:57:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dce245ee-1721-1fa3-f0f5-8ef6b7994bca (at 10.9.105.27@o2ib4) Aug 17 06:58:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 121f1ddf-ec4f-2be1-7af2-75a40f99121e (at 10.9.108.68@o2ib4) Aug 17 06:58:07 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566050280/real 1566050280] req@ffff8f394d446000 x1636771411944048/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566050287 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 06:58:07 fir-md1-s1 kernel: Lustre: 23680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 17 06:58:12 fir-md1-s1 kernel: Lustre: 23585:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b8df47800 x1634183234002448/t0(0) o101->d82be57b-2f2b-1591-b61e-7d36849f0064@10.9.109.71@o2ib4:17/0 lens 576/3264 e 1 to 0 dl 1566050297 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 06:58:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 396c953b-51d1-e614-261e-52aee9dc8ef2 (at 10.9.105.27@o2ib4) reconnecting Aug 17 06:58:14 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 17 06:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 121f1ddf-ec4f-2be1-7af2-75a40f99121e (at 10.9.108.68@o2ib4) Aug 17 06:58:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 06:58:39 fir-md1-s1 kernel: Lustre: 21429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2177319200 x1634938615690672/t0(0) o101->8e6b7782-0f04-da33-0138-eab1c9e41ffb@10.8.18.25@o2ib6:14/0 lens 608/3264 e 1 to 0 dl 1566050324 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 06:58:52 fir-md1-s1 kernel: LustreError: 20541:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566050242, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f10f52c33c0/0x5d9ee6d3d8faec1f lrc: 3/1,0 mode: --/PR res: [0x2c002c8c3:0x57b3:0x0].0x0 bits 0x13/0x0 rrc: 38 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20541 timeout: 0 lvb_type: 0 Aug 17 06:58:58 fir-md1-s1 kernel: Lustre: 23588:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f30993c6600 x1641907093773584/t0(0) o101->b6ea1fd6-0014-51e1-d7da-b5b1cf7003b8@10.8.22.2@o2ib6:3/0 lens 592/3264 e 1 to 0 dl 1566050343 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 06:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 609ccabf-8284-a50f-5c00-0993767b8511 (at 10.9.109.71@o2ib4) Aug 17 06:59:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 17 06:59:03 fir-md1-s1 kernel: LustreError: 23680:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff8f394d446000 x1636771411944048 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f1867e7d580/0x5d9ee6d3d8ab37e3 lrc: 4/0,0 mode: PR/PR res: [0x2c002c8c3:0x57b3:0x0].0x0 bits 0x13/0x0 rrc: 38 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xe469cc08720999c6 expref: 329 pid: 21429 timeout: 5165545 lvb_type: 0 Aug 17 06:59:03 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 17 06:59:03 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f1867e7d580/0x5d9ee6d3d8ab37e3 lrc: 3/0,0 mode: PR/PR res: [0x2c002c8c3:0x57b3:0x0].0x0 bits 0x13/0x0 rrc: 38 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xe469cc08720999c6 expref: 330 pid: 21429 timeout: 0 lvb_type: 0 Aug 17 07:00:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1a8f237a-bf47-56a9-c8bb-9e731d94c6d0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f17f47aa800, cur 1566050400 expire 1566050250 last 1566050173 Aug 17 07:02:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 07:20:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4e60e331-e288-0bd9-e186-6663a6dd423e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ea7291400, cur 1566051605 expire 1566051455 last 1566051378 Aug 17 07:20:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 07:22:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 07:22:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 07:35:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75d584f1-e3ff-1bef-cca5-431f5dd01b2a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f248c239800, cur 1566052502 expire 1566052352 last 1566052275 Aug 17 07:35:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 07:36:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 07:36:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 07:36:35 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 17 07:36:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 07:36:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 08:03:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ff42532a-f779-3d3f-f5a8-b549929080e4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f72c2a800, cur 1566054217 expire 1566054067 last 1566053990 Aug 17 08:03:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 08:06:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 08:06:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 08:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 16ad32ef-03f0-9048-960b-a4f173398634 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0796945c00, cur 1566056624 expire 1566056474 last 1566056397 Aug 17 08:43:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 08:46:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 08:46:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:05:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9ef41753-7d35-5ee1-b83e-6d32c75a0d91 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30cbd41800, cur 1566057919 expire 1566057769 last 1566057692 Aug 17 09:05:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:09:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 09:09:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:10:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.26.4@o2ib6, removing former export from same NID Aug 17 09:10:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 09:19:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 09:19:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c0ddad84-4ecc-e79c-8912-ad5e43754f69 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34db218400, cur 1566058797 expire 1566058647 last 1566058570 Aug 17 09:19:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:46:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1250eb59-4ce7-6187-c1fe-1aa86aaf0b77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2719051c00, cur 1566060363 expire 1566060213 last 1566060136 Aug 17 09:46:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 09:51:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 09:51:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 10:06:28 fir-md1-s1 kernel: LNetError: 20197:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 17 10:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5fbb5bcf-e02a-8f47-8379-02b1db28eec7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ed312fc00, cur 1566062503 expire 1566062353 last 1566062276 Aug 17 10:21:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 10:25:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00304ea7-578d-2727-24ce-d8f8efb87890 (at 10.8.26.4@o2ib6) Aug 17 10:25:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 15:52:15 fir-md1-s1 kernel: Lustre: 23623:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 17 15:57:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2abda4bb-2707-8ea8-c73d-7d8168a15387 (at 10.9.104.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e48b99c00, cur 1566082646 expire 1566082496 last 1566082419 Aug 17 15:57:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 17:14:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 53b1abf2-8889-8c65-d99d-8f49db5faf88 (at 10.8.25.3@o2ib6) Aug 17 17:14:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 17:17:23 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566087435/real 1566087435] req@ffff8f2066bb8600 x1636771837700064/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566087442 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 17 17:17:23 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 17 17:17:30 fir-md1-s1 kernel: Lustre: 20555:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1fbf64bf00 x1631566490960672/t0(0) o101->a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0@10.8.8.32@o2ib6:5/0 lens 1792/3288 e 1 to 0 dl 1566087455 ref 2 fl Interpret:/0/0 rc 0/0 Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566087450/real 1566087450] req@ffff8f2066bb8600 x1636771837700064/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566087457 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 17 17:17:37 fir-md1-s1 kernel: Lustre: 97656:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 17 17:17:51 fir-md1-s1 kernel: LustreError: 97656:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.0.64@o2ib4) failed to reply to blocking AST (req@ffff8f2066bb8600 x1636771837700064 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f0812ca0900/0x5d9ee6d4516229af lrc: 4/0,0 mode: PR/PR res: [0x2c002c581:0x17173:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.0.64@o2ib4 remote: 0x56ac9c2c43d2b783 expref: 11828 pid: 23761 timeout: 5202553 lvb_type: 0 Aug 17 17:17:51 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.0.64@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 17 17:17:51 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 36s: evicting client at 10.9.0.64@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0812ca0900/0x5d9ee6d4516229af lrc: 3/0,0 mode: PR/PR res: [0x2c002c581:0x17173:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.0.64@o2ib4 remote: 0x56ac9c2c43d2b783 expref: 11829 pid: 23761 timeout: 0 lvb_type: 0 Aug 17 17:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 39e76845-4976-21c9-38bb-bb738759d72c (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2647368800, cur 1566087641 expire 1566087491 last 1566087414 Aug 17 17:20:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 18:29:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 88da9cf6-7221-7094-5cfd-f52f4ceab24d (at 10.9.114.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f39f8769000, cur 1566091798 expire 1566091648 last 1566091571 Aug 17 18:29:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 17 18:54:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 17 19:16:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 55de9ba9-39b0-c299-474f-e1f05cb71bde (at 10.8.18.23@o2ib6) Aug 17 19:16:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 19:16:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f88cd672-b988-5a0d-0805-b245dcab0ad3 (at 10.8.18.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f29d4818400, cur 1566094613 expire 1566094463 last 1566094386 Aug 17 19:16:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 19:20:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 146eb3ae-7075-0c36-ed9d-de4cbb7061f3 (at 10.8.18.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f31b3917c00, cur 1566094830 expire 1566094680 last 1566094603 Aug 17 19:20:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 19:20:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c5a0c21e-5cbc-adc9-2da1-740b5c874dda (at 10.8.18.13@o2ib6) Aug 17 19:20:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 17 20:40:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c689b405-9013-9f55-4c30-3ec9525bf810 (at 10.9.106.53@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f2752a000, cur 1566099655 expire 1566099505 last 1566099428 Aug 17 20:40:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 01:08:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8d5a7230-e899-d3b2-2a27-11dcff3b1c6b (at 10.8.26.23@o2ib6) Aug 18 01:08:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 08:03:06 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 18 08:03:06 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 18 08:03:11 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 18 08:03:11 fir-md1-s1 kernel: Lustre: 23555:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 231 previous similar messages Aug 18 08:21:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5ff0d9f4-a5e3-0ea4-6724-c12e7a460650 (at 10.8.18.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16c15cfc00, cur 1566141711 expire 1566141561 last 1566141484 Aug 18 08:21:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 08:22:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5ff0d9f4-a5e3-0ea4-6724-c12e7a460650 (at 10.8.18.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d08ab8800, cur 1566141721 expire 1566141571 last 1566141494 Aug 18 08:22:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 175c2cbc-890f-ba20-a8dd-deeb6d07ea97 (at 10.8.18.22@o2ib6) Aug 18 08:22:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 22:40:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 527b80ba-d679-e838-d973-d86b13aeb5ff (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bcc35a800, cur 1566193235 expire 1566193085 last 1566193008 Aug 18 22:40:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 18 22:40:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 22:40:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 22:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fc673cc5-5dd8-0d15-3067-84902cd4b6a0 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f43b0e54000, cur 1566194393 expire 1566194243 last 1566194166 Aug 18 22:59:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:00:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:00:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:05:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 35ff08f3-f6cb-0e86-8af2-67e2d8a655d5 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d39c5a800, cur 1566194734 expire 1566194584 last 1566194507 Aug 18 23:05:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:05:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:05:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:12:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cf66b6c0-a83a-2201-a189-949e1a8a7e65 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c7f137000, cur 1566195172 expire 1566195022 last 1566194945 Aug 18 23:12:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:13:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:13:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:18:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9bf13f13-e2bc-d4e5-deee-0012bae8cd3a (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d370f8000, cur 1566195537 expire 1566195387 last 1566195310 Aug 18 23:18:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:26:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:26:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:37:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d2457429-922f-5ba6-63d5-6ca0dab45a98 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ae4271800, cur 1566196667 expire 1566196517 last 1566196440 Aug 18 23:37:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:39:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:39:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:39:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.12@o2ib6, removing former export from same NID Aug 18 23:39:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d32eefcf-60c1-b906-d404-3f0309f470eb (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f325bef1800, cur 1566197128 expire 1566196978 last 1566196901 Aug 18 23:45:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:46:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:46:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:51:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8f65b359-5be6-76a0-fbb0-663e28213052 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1825cb4c00, cur 1566197466 expire 1566197316 last 1566197239 Aug 18 23:51:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:52:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:52:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:56:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ab07bc46-8dbd-1ed8-5709-12a877927bb0 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2d233be800, cur 1566197815 expire 1566197665 last 1566197588 Aug 18 23:56:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 18 23:58:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 18 23:58:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 00:19:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2cc302ed-0638-c203-29d0-f6d304a91b31 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c02e83c00, cur 1566199141 expire 1566198991 last 1566198914 Aug 19 00:19:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 00:22:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 00:22:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 00:47:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e3bd5a0d-108e-ac33-7a0b-daa5d3b659bd (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f3eb4a000, cur 1566200843 expire 1566200693 last 1566200616 Aug 19 00:47:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 00:48:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 00:48:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:07:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 58128319-a740-0505-50d7-1da63ad8e377 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b83cc4c00, cur 1566202072 expire 1566201922 last 1566201845 Aug 19 01:07:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:15:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 01:15:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:37:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 463d07cb-e985-ad42-996f-b9fbcee1e6a7 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3518327400, cur 1566203820 expire 1566203670 last 1566203593 Aug 19 01:37:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:39:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 01:39:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:39:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.12@o2ib6, removing former export from same NID Aug 19 01:39:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 01:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b687c540-502b-27d0-e467-296444bd2962 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f343e761c00, cur 1566204249 expire 1566204099 last 1566204022 Aug 19 01:44:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:45:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2cc9d282-648b-5730-b2e6-a8256e61fe5b (at 10.8.1.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f28d9f75800, cur 1566204354 expire 1566204204 last 1566204127 Aug 19 01:45:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:46:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 19 01:46:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 01:47:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0246e46d-1b57-ecfe-ed0c-5c182444f97b (at 10.8.1.32@o2ib6) Aug 19 01:47:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 02:09:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d1800347-72ce-eadd-608d-51a435000390 (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30ad363400, cur 1566205784 expire 1566205634 last 1566205557 Aug 19 03:11:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7a661b61-892c-9ef5-2ffb-969e39c660c0 (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a40bef800, cur 1566209488 expire 1566209338 last 1566209261 Aug 19 03:11:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 03:12:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 290d6608-03d4-0bb1-48e8-288d4a314d54 (at 10.8.28.9@o2ib6) Aug 19 03:12:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 04:27:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9fd76833-8ab6-b007-315f-f6cb00329bdd (at 10.8.1.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34ecb75400, cur 1566214041 expire 1566213891 last 1566213814 Aug 19 04:27:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 04:29:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0246e46d-1b57-ecfe-ed0c-5c182444f97b (at 10.8.1.32@o2ib6) Aug 19 04:29:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 05:29:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0246e46d-1b57-ecfe-ed0c-5c182444f97b (at 10.8.1.32@o2ib6) Aug 19 05:29:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 06:09:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0246e46d-1b57-ecfe-ed0c-5c182444f97b (at 10.8.1.32@o2ib6) Aug 19 06:09:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 06:22:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a253ad48-fe6f-56d5-df20-3fec8fbfa0d2 (at 10.9.116.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e15fa8c00, cur 1566220952 expire 1566220802 last 1566220725 Aug 19 06:22:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 06:24:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 219fc99a-1206-4f42-4b15-ea15722edea9 (at 10.9.116.7@o2ib4) Aug 19 06:24:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 07:01:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 904bd105-fefc-cbe7-ee1c-8f3381873cf6 (at 10.9.113.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24e32d7800, cur 1566223292 expire 1566223142 last 1566223065 Aug 19 07:01:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:10:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75e2ae05-f866-35cc-7658-2ea7661ae9f4 (at 10.9.104.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1c5ac63c00, cur 1566227456 expire 1566227306 last 1566227229 Aug 19 08:10:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:11:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a15e8f7f-c63f-bd44-a967-0ea3bbbe4e4d (at 10.9.104.7@o2ib4) Aug 19 08:11:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:56:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 29e229ef-0b7d-e0ce-48dd-1c614dad7928 (at 10.9.112.15@o2ib4) Aug 19 08:56:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:57:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6dea6116-7d9e-a85e-3b90-a58c496b99d3 (at 10.9.115.10@o2ib4) Aug 19 08:57:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:58:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f8aa3103-8036-c7a1-632a-678f419cd911 (at 10.9.113.1@o2ib4) Aug 19 08:58:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 08:59:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to beb38144-d000-b47c-bba7-ccce9e6df4a5 (at 10.9.114.10@o2ib4) Aug 19 08:59:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 09:02:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 40f1504a-c272-a5f6-4ce1-4c06c50afda7 (at 10.9.106.53@o2ib4) Aug 19 09:02:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 09:07:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8cd7f013-9a17-1cbd-e8a4-7acaf2642e05 (at 10.8.26.18@o2ib6) Aug 19 09:07:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 09:12:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.34@o2ib4) Aug 19 09:12:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 09:18:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d5145b19-7e77-2465-cb06-19cf549382e1 (at 10.8.7.8@o2ib6) Aug 19 09:18:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 10:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c14bf4c5-b9f6-d04f-2c8a-c85dd78efbd5 (at 10.9.109.45@o2ib4) reconnecting Aug 19 10:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c01d2cd4-adbc-3e30-cd67-6065a5747f47 (at 10.9.109.45@o2ib4) Aug 19 10:56:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 19 10:56:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 19 10:57:12 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.45@o2ib4, removing former export from same NID Aug 19 10:57:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c01d2cd4-adbc-3e30-cd67-6065a5747f47 (at 10.9.109.45@o2ib4) Aug 19 10:57:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 19 11:06:38 fir-md1-s1 kernel: LustreError: 14660:0:(mgs_llog.c:5131:mgs_set_conf_param()) No filesystem targets for oak. cfg_device from lctl is 'oak' Aug 19 11:06:38 fir-md1-s1 kernel: LustreError: 14660:0:(mgs_handler.c:1031:mgs_iocontrol()) MGS: setparam err: rc = -22 Aug 19 11:11:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5c9f5376-a105-7e2f-1c52-759657f6fd7d (at 10.9.101.59@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2521159000, cur 1566238300 expire 1566238150 last 1566238073 Aug 19 11:11:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 11:47:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5d6eb12a-986d-3a49-d3ca-602d8bd21b2a (at 10.9.101.59@o2ib4) Aug 19 12:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 31496984-2417-54d9-d969-3e3ac1f4f078 (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ce2d38400, cur 1566242162 expire 1566242012 last 1566241935 Aug 19 12:16:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ffae36e1-3410-424f-9745-916084c8fe02 (at 10.8.19.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2eab302c00, cur 1566242481 expire 1566242331 last 1566242254 Aug 19 12:21:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:39:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d2e6b7c8-4f9c-d861-3c27-61235b570303 (at 10.8.26.14@o2ib6) Aug 19 12:39:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:43:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1f29c7a2-d2d3-0a98-27b0-578e87d088ab (at 10.8.9.2@o2ib6) Aug 19 12:43:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:44:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f37bcae1-5968-c42e-e3ff-512bc53b2aa0 (at 10.8.19.5@o2ib6) Aug 19 12:44:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:44:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3dfded4-9107-609c-ae57-2966b58f71e4 (at 10.8.9.1@o2ib6) Aug 19 12:44:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 12:46:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 19 12:46:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 13:57:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e0b5a260-f551-c8e2-c8c6-f384132786cf (at 10.9.108.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f3a99c000, cur 1566248228 expire 1566248078 last 1566248001 Aug 19 13:57:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 14:24:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.22@o2ib4) Aug 19 14:24:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 14:49:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c14bf4c5-b9f6-d04f-2c8a-c85dd78efbd5 (at 10.9.109.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252babb800, cur 1566251370 expire 1566251220 last 1566251143 Aug 19 14:49:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 15:30:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 54ff1087-ffbc-2b4f-1b08-bd5d84bfdf36 (at 10.8.26.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34931bf000, cur 1566253806 expire 1566253656 last 1566253579 Aug 19 15:30:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 15:32:24 fir-md1-s1 kernel: Lustre: 21181:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3303d7d100 x1631566993341904/t0(0) o101->a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0@10.8.8.32@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1566253949 ref 2 fl Interpret:/0/0 rc 0/0 Aug 19 15:32:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) reconnecting Aug 19 15:32:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Aug 19 15:32:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 15:32:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.8.32@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f173ab4b600/0x5d9ee6d623dd3b1c lrc: 3/0,0 mode: PR/PR res: [0x2c002c941:0x320c:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.8.32@o2ib6 remote: 0x686b261c5a94b0b9 expref: 1190 pid: 50583 timeout: 5369018 lvb_type: 0 Aug 19 15:32:38 fir-md1-s1 kernel: LustreError: 23590:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2523430800 ns: mdt-fir-MDT0002_UUID lock: ffff8f1d5426b600/0x5d9ee6d623dd53a3 lrc: 3/0,0 mode: PW/PW res: [0x2c002c941:0x320c:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.8.32@o2ib6 remote: 0x686b261c5a94b0ff expref: 1071 pid: 23590 timeout: 0 lvb_type: 0 Aug 19 15:32:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Aug 19 15:36:05 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 19 15:36:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3488c6a400, cur 1566254186 expire 1566254036 last 1566253959 Aug 19 15:36:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 19 15:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2defae61-8bf0-dee6-7d48-53b83a69e973 (at 10.8.17.24@o2ib6) reconnecting Aug 19 15:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 61b2e93e-80a5-0e4a-1403-32b1efead904 (at 10.8.17.24@o2ib6) Aug 19 15:37:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 19 15:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 96f77fe0-d0c2-629d-bb62-dcf685e7e47d (at 10.9.0.61@o2ib4) reconnecting Aug 19 15:37:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Aug 19 15:39:13 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 19 15:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 96f77fe0-d0c2-629d-bb62-dcf685e7e47d (at 10.9.0.61@o2ib4) reconnecting Aug 19 15:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Aug 19 15:43:01 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254581 with bad export cookie 6746083099252935148 Aug 19 15:43:01 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254581 with bad export cookie 6746083099252935148 Aug 19 15:43:01 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages Aug 19 15:43:03 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254583 with bad export cookie 6746083099252935148 Aug 19 15:43:03 fir-md1-s1 kernel: LustreError: 25030:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 4 previous similar messages Aug 19 15:43:06 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254586 with bad export cookie 6746083099252935148 Aug 19 15:43:06 fir-md1-s1 kernel: LustreError: 48116:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 6 previous similar messages Aug 19 15:43:10 fir-md1-s1 kernel: LustreError: 22009:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254590 with bad export cookie 6746083099252935148 Aug 19 15:43:10 fir-md1-s1 kernel: LustreError: 22009:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 27 previous similar messages Aug 19 15:43:18 fir-md1-s1 kernel: LustreError: 22895:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254598 with bad export cookie 6746083099252935148 Aug 19 15:43:18 fir-md1-s1 kernel: LustreError: 22895:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 22 previous similar messages Aug 19 15:43:36 fir-md1-s1 kernel: LustreError: 25081:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254616 with bad export cookie 6746083099252935148 Aug 19 15:43:36 fir-md1-s1 kernel: LustreError: 25081:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 33 previous similar messages Aug 19 15:44:08 fir-md1-s1 kernel: LustreError: 31015:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254648 with bad export cookie 6746083099252935148 Aug 19 15:44:08 fir-md1-s1 kernel: LustreError: 31015:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 55 previous similar messages Aug 19 15:45:12 fir-md1-s1 kernel: LustreError: 46811:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.32@o2ib6 arrived at 1566254712 with bad export cookie 6746083099252935148 Aug 19 15:45:12 fir-md1-s1 kernel: LustreError: 46811:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 158 previous similar messages Aug 19 15:47:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Aug 19 15:49:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 85a09051-e66c-e51b-3014-f2fbb0940f04 (at 10.8.1.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2c76e9c000, cur 1566254976 expire 1566254826 last 1566254749 Aug 19 15:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a5eec2e6-62e8-19e2-7ed8-f567dc50fbb0 (at 10.8.8.32@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff8f2edae32c00, cur 1566255052 expire 1566254902 last 1566254828 Aug 19 15:50:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 15:54:09 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 19 15:54:09 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (105): c: 8, oc: 0, rc: 8 Aug 19 15:55:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0246e46d-1b57-ecfe-ed0c-5c182444f97b (at 10.8.1.32@o2ib6) Aug 19 15:59:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6726fecc-3078-ba4a-fb68-64e928250f1f (at 10.9.102.31@o2ib4) Aug 19 15:59:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 16:03:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4ac662d8-8e5b-be1f-715e-6cfca148208b (at 10.9.116.6@o2ib4) Aug 19 16:03:58 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 19 16:06:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 842f6d8e-7146-d6c4-096a-99848fb525f9 (at 10.9.101.56@o2ib4) Aug 19 16:06:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 16:10:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cabca898-a90b-ce85-f91e-fd5668d5390c (at 10.8.27.26@o2ib6) Aug 19 16:10:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 19 16:33:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to decafcd4-1025-6a4f-aec1-3ef23146e982 (at 10.9.116.5@o2ib4) Aug 19 16:33:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 17:02:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 17:02:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 19:18:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a97dfeca-d440-63a5-81d7-9d4a6d540295 (at 10.9.108.36@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0da0e74000, cur 1566267480 expire 1566267330 last 1566267253 Aug 19 19:20:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 967696f3-fdf4-8ac7-e6d3-30b5c0bda814 (at 10.9.108.36@o2ib4) Aug 19 19:20:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 23:48:18 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283691/real 1566283691] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283698 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 19 23:48:18 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 19 23:48:25 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283698/real 1566283698] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283705 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:48:26 fir-md1-s1 kernel: Lustre: 20541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f137d957b00 x1642342234019712/t0(0) o101->4b5e38e5-fc92-e3a6-2ac7-67a297229875@10.0.10.3@o2ib7:1/0 lens 480/568 e 1 to 0 dl 1566283711 ref 2 fl Interpret:/0/0 rc 0/0 Aug 19 23:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:48:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 23:48:32 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283705/real 1566283705] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283712 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:48:46 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283719/real 1566283719] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283726 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:48:46 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 19 23:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:49:07 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283740/real 1566283740] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283747 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:49:07 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 19 23:49:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:49:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:49:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:49:42 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283775/real 1566283775] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283782 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:49:42 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 19 23:49:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:49:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:50:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:50:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:50:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:50:52 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566283845/real 1566283845] req@ffff8f0f8f4b2400 x1636773427463984/t0(0) o106->fir-MDT0000@10.9.102.20@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566283852 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 19 23:50:52 fir-md1-s1 kernel: Lustre: 23568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 19 23:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b5e38e5-fc92-e3a6-2ac7-67a297229875 (at 10.0.10.3@o2ib7) reconnecting Aug 19 23:51:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 19 23:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 19 23:51:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 19 23:51:31 fir-md1-s1 kernel: LNet: Service thread pid 23568 was inactive for 200.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 19 23:51:31 fir-md1-s1 kernel: Pid: 23568, comm: mdt00_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 19 23:51:31 fir-md1-s1 kernel: Call Trace: Aug 19 23:51:31 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 19 23:51:31 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 19 23:51:31 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 19 23:51:31 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 19 23:51:31 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 19 23:51:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 19 23:51:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 19 23:51:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 19 23:51:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566283891.23568 Aug 19 23:51:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d1426453-dd71-1a3f-ad8b-0d74577e4781 (at 10.9.102.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f145a30ec00, cur 1566283906 expire 1566283756 last 1566283679 Aug 19 23:51:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 19 23:51:46 fir-md1-s1 kernel: LNet: Service thread pid 23568 completed after 214.40s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 20 01:21:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 13696a6e-8174-8168-ce9f-13e2ae76f5c2 (at 10.9.106.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f07e1b4f400, cur 1566289293 expire 1566289143 last 1566289066 Aug 20 01:21:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 01:23:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 359050aa-7d81-42d8-9871-799516e9467e (at 10.9.106.55@o2ib4) Aug 20 01:23:46 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 20 01:33:57 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e89cfbe6-1988-7993-6265-55133f531b16 (at 10.8.21.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42d50fc400, cur 1566290037 expire 1566289887 last 1566289810 Aug 20 01:33:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 01:34:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 36270d6a-6063-9df6-30ef-89e9be6fb29b (at 10.8.21.8@o2ib6) Aug 20 01:34:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 02:30:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3dc8f7f0-0159-c02d-1c69-b89874ac6ab8 (at 10.8.24.20@o2ib6) Aug 20 02:30:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 03:37:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.20@o2ib4) Aug 20 03:37:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 07:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 746b3d8e-c221-65e3-9e0b-3d48071d79a2 (at 10.9.0.81@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0a01b41800, cur 1566312450 expire 1566312300 last 1566312223 Aug 20 07:49:02 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 07:49:02 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 72 previous similar messages Aug 20 07:49:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 07:49:03 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 20 07:52:38 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 07:52:44 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 07:52:44 fir-md1-s1 kernel: Lustre: 10307:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 20 07:55:06 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 07:55:06 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 20 08:00:20 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 08:06:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3dfded4-9107-609c-ae57-2966b58f71e4 (at 10.8.9.1@o2ib6) Aug 20 08:06:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 08:09:23 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 09:17:30 fir-md1-s1 kernel: Lustre: 10198:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 09:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client de332a97-571d-85cf-fddf-b5634ebd2059 (at 10.9.106.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f32c521ec00, cur 1566318103 expire 1566317953 last 1566317876 Aug 20 09:21:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 09:23:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 22280de3-e127-2943-2417-f27756433740 (at 10.9.0.81@o2ib4) Aug 20 09:23:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 09:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 04c17dce-45f1-fe7e-2627-7efeaaeaddb9 (at 10.9.0.62@o2ib4) reconnecting Aug 20 09:36:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 20 09:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 20 09:36:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 09:41:25 fir-md1-s1 kernel: Lustre: 23561:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 10:13:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b61e6888-04f2-1956-3222-9ddedcb986c4 (at 10.9.106.9@o2ib4) Aug 20 13:41:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 20 13:41:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 14:53:49 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 20 14:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 51002e48-a06e-3405-fcaa-ac377ed743af (at 10.8.17.9@o2ib6) reconnecting Aug 20 14:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 0b038678-5105-7beb-2ba3-b7f535c36e94 (at 10.8.17.9@o2ib6) Aug 20 14:53:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 14:54:14 fir-md1-s1 kernel: Lustre: 23561:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 14:54:14 fir-md1-s1 kernel: Lustre: 23561:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 20 15:23:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f4e109a3-1e90-c790-4690-6ae8b31fae28 (at 10.9.114.13@o2ib4) Aug 20 15:23:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 95a08d8f-9642-dfa3-196e-6690fe5cc975 (at 10.9.114.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ce57c6400, cur 1566339802 expire 1566339652 last 1566339575 Aug 20 15:23:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 15:28:55 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 15:39:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8d4f52de-9afb-71e7-a087-b14fa81c7c60 (at 10.8.30.1@o2ib6) Aug 20 15:39:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 16:29:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dd055160-73e1-c0f8-3c11-ca5351f1fd45 (at 10.9.105.71@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f457fcf8000, cur 1566343767 expire 1566343617 last 1566343540 Aug 20 16:29:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 16:41:27 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 20 17:04:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6e3864dd-c107-e988-adae-aa024a37aa2f (at 10.9.105.71@o2ib4) Aug 20 17:04:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 17:08:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e5b93380-d799-cd49-04b1-82827b5a442d (at 10.9.108.51@o2ib4) Aug 20 17:08:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 17:12:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to be98bff5-9c81-fd8a-3462-cd8ead57c496 (at 10.9.110.6@o2ib4) Aug 20 17:12:38 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 20 17:22:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 79e647da-c3f6-a3be-d8fe-44afe2c61e65 (at 10.9.104.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f250acd8000, cur 1566346953 expire 1566346803 last 1566346726 Aug 20 17:22:33 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 20 17:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6a2c60e3-72ad-ad36-43cf-32103ac84dfa (at 10.9.116.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148cb8f800, cur 1566348845 expire 1566348695 last 1566348618 Aug 20 17:54:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 17:55:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 219fc99a-1206-4f42-4b15-ea15722edea9 (at 10.9.116.7@o2ib4) Aug 20 17:55:13 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 20 17:59:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4a7344e0-97c7-2efc-aa49-ef6594e7ea2a (at 10.9.104.64@o2ib4) Aug 20 17:59:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 18:20:28 fir-md1-s1 kernel: Lustre: 23708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 18:20:28 fir-md1-s1 kernel: Lustre: 23708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 20 18:22:26 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 18:25:49 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 20 18:41:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 139e0fd1-034a-aa93-b6c1-2e2ccf18da01 (at 10.9.110.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19f6483800, cur 1566351705 expire 1566351555 last 1566351478 Aug 20 18:41:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 18:41:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 139e0fd1-034a-aa93-b6c1-2e2ccf18da01 (at 10.9.110.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2bf7d89000, cur 1566351707 expire 1566351557 last 1566351480 Aug 20 18:41:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 20 18:42:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ab062f53-2bf4-b417-57a5-92121b8c332b (at 10.9.110.37@o2ib4) Aug 20 18:42:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 18:58:33 fir-md1-s1 kernel: Lustre: 49251:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06b1a59450 x1640814414973168/t0(0) o3->86ce6a09-aa4f-20ac-b141-ea6861e1967e@10.9.101.39@o2ib4:8/0 lens 488/440 e 1 to 0 dl 1566352718 ref 2 fl Interpret:/0/0 rc 0/0 Aug 20 22:19:01 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.57@o2ib4, removing former export from same NID Aug 20 22:19:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2d807e06-532a-262c-bdb4-7dba5f9db0e8 (at 10.9.107.57@o2ib4) Aug 20 22:19:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 20 22:20:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.57@o2ib4, removing former export from same NID Aug 20 22:20:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2d807e06-532a-262c-bdb4-7dba5f9db0e8 (at 10.9.107.57@o2ib4) Aug 20 23:56:23 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 20 23:56:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 20 23:56:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 21 00:32:08 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 01:31:59 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 01:31:59 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 21 01:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ddea348b-e5a4-5330-325a-755d459e8dda (at 10.9.107.57@o2ib4) reconnecting Aug 21 01:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2d807e06-532a-262c-bdb4-7dba5f9db0e8 (at 10.9.107.57@o2ib4) Aug 21 01:48:05 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 01:48:05 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 02:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a389a0dd-207b-e44b-45a8-abab1b354808 (at 10.9.109.45@o2ib4) reconnecting Aug 21 02:13:05 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.45@o2ib4, removing former export from same NID Aug 21 02:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c01d2cd4-adbc-3e30-cd67-6065a5747f47 (at 10.9.109.45@o2ib4) Aug 21 02:13:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 21 02:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a389a0dd-207b-e44b-45a8-abab1b354808 (at 10.9.109.45@o2ib4) reconnecting Aug 21 02:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c01d2cd4-adbc-3e30-cd67-6065a5747f47 (at 10.9.109.45@o2ib4) Aug 21 02:13:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 02:14:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.109.45@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 21 02:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2d756374-54f3-168c-5d53-2ddb4062024e (at 10.9.109.55@o2ib4) reconnecting Aug 21 02:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to fe19c866-2477-4fa3-9cdd-b5d4f9018971 (at 10.9.109.55@o2ib4) Aug 21 02:30:59 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 02:47:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98a2e267-7ec4-26e6-8e49-234410a6b030 (at 10.9.108.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45307de400, cur 1566380822 expire 1566380672 last 1566380595 Aug 21 02:52:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.57@o2ib4, removing former export from same NID Aug 21 02:52:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2d807e06-532a-262c-bdb4-7dba5f9db0e8 (at 10.9.107.57@o2ib4) Aug 21 02:53:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.57@o2ib4, removing former export from same NID Aug 21 02:53:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2d807e06-532a-262c-bdb4-7dba5f9db0e8 (at 10.9.107.57@o2ib4) Aug 21 03:21:55 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.15@o2ib4, removing former export from same NID Aug 21 03:21:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to be7862a4-c456-2986-26e5-d6fe05e63091 (at 10.9.109.15@o2ib4) Aug 21 03:53:34 fir-md1-s1 kernel: Lustre: 23571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 04:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 44075f0e-eeaa-5b68-aad0-852740e28e93 (at 10.9.109.15@o2ib4) reconnecting Aug 21 04:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to be7862a4-c456-2986-26e5-d6fe05e63091 (at 10.9.109.15@o2ib4) Aug 21 06:21:28 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 06:21:28 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 21 06:25:24 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 06:25:24 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Aug 21 06:54:36 fir-md1-s1 kernel: Lustre: 23708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 07:09:03 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 07:09:03 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 26 previous similar messages Aug 21 07:43:20 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 07:45:36 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 07:45:36 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Aug 21 08:06:19 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:09:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3f7416a6-2c4b-0bfc-9756-ce2a59707b86 (at 10.8.17.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f15adc4b000, cur 1566400147 expire 1566399997 last 1566399920 Aug 21 08:09:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 08:17:45 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:18:58 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:26:35 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:26:35 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 21 08:26:53 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:26:53 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 08:27:05 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 08:27:05 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 21 09:10:10 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:13:12 fir-md1-s1 kernel: Lustre: 21368:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:19:46 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:21:00 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:21:00 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 21 09:21:58 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:21:58 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 21 09:27:45 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:27:45 fir-md1-s1 kernel: Lustre: 23585:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 09:29:43 fir-md1-s1 kernel: Lustre: 23589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:29:43 fir-md1-s1 kernel: Lustre: 23589:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 09:33:08 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 09:33:08 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 09:36:11 fir-md1-s1 kernel: Lustre: 23706:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:26:14 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:29:03 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:29:03 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 21 10:42:15 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 21 10:42:52 fir-md1-s1 kernel: Lustre: 23708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:42:52 fir-md1-s1 kernel: Lustre: 23708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2709 previous similar messages Aug 21 10:47:34 fir-md1-s1 kernel: Lustre: 23572:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:50:28 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:50:28 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 10:53:50 fir-md1-s1 kernel: Lustre: 23589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 10:57:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e29f6df1-f303-5d4a-10c4-be6c059f963a (at 10.9.104.46@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdc72c00, cur 1566410264 expire 1566410114 last 1566410037 Aug 21 10:57:44 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 21 10:57:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e29f6df1-f303-5d4a-10c4-be6c059f963a (at 10.9.104.46@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f451eab8c00, cur 1566410269 expire 1566410119 last 1566410042 Aug 21 11:03:44 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:08:53 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:08:53 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 21 11:21:14 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:21:14 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 11:43:26 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:43:26 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 21 11:51:31 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:51:31 fir-md1-s1 kernel: Lustre: 10305:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Aug 21 11:54:46 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 11:54:46 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 21 12:25:39 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:25:39 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 22 previous similar messages Aug 21 12:29:23 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:29:23 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 21 12:32:16 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:32:16 fir-md1-s1 kernel: Lustre: 23556:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 21 12:39:38 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:39:38 fir-md1-s1 kernel: Lustre: 25676:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Aug 21 12:45:54 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:45:54 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 21 12:59:49 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 12:59:49 fir-md1-s1 kernel: Lustre: 23570:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Aug 21 13:14:37 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 13:14:37 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 20 previous similar messages Aug 21 16:29:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a97cc4f6-6e92-7669-8f75-f73e64eb3df2 (at 10.9.108.35@o2ib4) Aug 21 16:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98a2e267-7ec4-26e6-8e49-234410a6b030 (at 10.9.108.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0c410bd400, cur 1566430465 expire 1566430315 last 1566430238 Aug 21 16:34:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 21 16:51:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9ce3d33b-8432-50bd-caed-34c9357d720c (at 10.8.8.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f213ec56400, cur 1566431465 expire 1566431315 last 1566431238 Aug 21 16:52:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 44fa4d3c-74dc-6646-84b4-35b992ff2b3e (at 10.8.8.33@o2ib6) in 165 seconds. I think it's dead, and I am evicting it. exp ffff8f0c95aa8800, cur 1566431541 expire 1566431391 last 1566431376 Aug 21 16:52:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 21 16:54:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 98b862eb-2092-2bce-5946-abd2c64dd438 (at 10.9.104.46@o2ib4) Aug 21 17:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f35e9783400, cur 1566432154 expire 1566432004 last 1566431927 Aug 21 17:02:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 17:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e18301fc-f860-0db4-bf24-6c606e0cc839 (at 10.8.8.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3216bbe000, cur 1566432157 expire 1566432007 last 1566431930 Aug 21 17:02:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 21 17:19:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e1008ff1-7911-4d3d-cd72-11efd094b730 (at 10.8.8.30@o2ib6) Aug 21 17:19:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 17:22:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 347ffbdc-328a-c7b5-0dc8-6a73375f2e66 (at 10.8.8.33@o2ib6) Aug 21 17:22:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 17:31:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a7eec79a-7d8d-cc60-d534-cf51564be7b7 (at 10.8.8.32@o2ib6) Aug 21 17:31:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 17:41:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Aug 21 17:41:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 18:03:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8172217c-cb28-d209-5f1f-4aceb1d4d3a6 (at 10.8.8.31@o2ib6) Aug 21 18:03:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 21 20:06:44 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 20:06:44 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 21 20:08:51 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 20:08:51 fir-md1-s1 kernel: Lustre: 10195:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Aug 21 20:11:31 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 20:11:31 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 152 previous similar messages Aug 21 20:25:51 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 20:25:51 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 187 previous similar messages Aug 21 20:38:20 fir-md1-s1 kernel: Lustre: 23648:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 20:38:20 fir-md1-s1 kernel: Lustre: 23648:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages Aug 21 20:48:35 fir-md1-s1 kernel: Lustre: 20541:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 21:10:26 fir-md1-s1 kernel: Lustre: 23577:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 21:10:26 fir-md1-s1 kernel: Lustre: 23577:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Aug 21 22:26:02 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 21 22:26:02 fir-md1-s1 kernel: Lustre: 21670:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Aug 22 00:30:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75a243e1-9f6a-0fab-ec0e-ce32dad51415 (at 10.9.106.71@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4532f5c000, cur 1566459029 expire 1566458879 last 1566458802 Aug 22 01:21:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 535fa562-c7d9-3df6-858a-3a5b64365a2a (at 10.8.27.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2520cf3400, cur 1566462074 expire 1566461924 last 1566461847 Aug 22 01:21:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 01:22:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7ae38cd3-4507-df99-00f6-07dde94a26a6 (at 10.8.27.32@o2ib6) Aug 22 01:22:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 01:28:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9c273d13-a6d2-7016-4b48-73a643e7570f (at 10.9.106.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453e966800, cur 1566462530 expire 1566462380 last 1566462303 Aug 22 01:28:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 01:29:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 32173935-f8f5-2fac-aa1a-9381ea33b14f (at 10.9.106.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1505446400, cur 1566462548 expire 1566462398 last 1566462321 Aug 22 01:29:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 02:22:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3d7e8f12-7be2-ea29-b7bf-4852602a4361 (at 10.9.106.56@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fd034c00, cur 1566465762 expire 1566465612 last 1566465535 Aug 22 02:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5e19627e-4858-13b2-d184-6f0babe2fa23 (at 10.9.106.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4501b96c00, cur 1566467188 expire 1566467038 last 1566466961 Aug 22 02:46:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 02:47:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3ef17f0c-d35b-8428-c1da-c84a40a8bdbc (at 10.9.101.71@o2ib4) in 178 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe499800, cur 1566467264 expire 1566467114 last 1566467086 Aug 22 02:47:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 02:52:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 277acb86-7d76-fb91-b38b-b09d3f1edfbd (at 10.9.106.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fe5b3800, cur 1566467520 expire 1566467370 last 1566467293 Aug 22 02:52:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 02:59:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ebdfc146-91cd-0738-28b3-d1a3e9c5bb1d (at 10.9.106.32@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2530bee000, cur 1566467967 expire 1566467817 last 1566467740 Aug 22 02:59:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 22 03:02:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6380bdc-cbb7-9ae0-5b59-79b2a24e3c00 (at 10.9.106.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fc384000, cur 1566468167 expire 1566468017 last 1566467940 Aug 22 03:02:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76ed0af9-aa81-fdfa-a462-54cb6855d00e (at 10.9.106.20@o2ib4) in 177 seconds. I think it's dead, and I am evicting it. exp ffff8f45067ab400, cur 1566468243 expire 1566468093 last 1566468066 Aug 22 03:04:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:08:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5580c86e-93fc-ec0b-7809-c452eedb4044 (at 10.9.106.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148a3ec400, cur 1566468520 expire 1566468370 last 1566468293 Aug 22 03:08:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:19:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f070aa79-4085-01c4-e45c-5c90a853bda7 (at 10.9.106.25@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148acf0800, cur 1566469171 expire 1566469021 last 1566468944 Aug 22 03:19:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:20:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b3cd89e1-f655-4588-81fb-0f0ea0fc23e3 (at 10.9.106.27@o2ib4) in 153 seconds. I think it's dead, and I am evicting it. exp ffff8f2524268c00, cur 1566469247 expire 1566469097 last 1566469094 Aug 22 03:20:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:22:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 427422bc-2dc5-3613-8742-f2dc7e69d571 (at 10.9.106.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2533271c00, cur 1566469321 expire 1566469171 last 1566469094 Aug 22 03:22:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 03:27:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d314c916-d68d-db9f-ee0f-59ee4d488258 (at 10.9.106.36@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1489418000, cur 1566469629 expire 1566469479 last 1566469402 Aug 22 03:36:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bc3c4d7f-4161-b1f9-2c95-90855fce208a (at 10.9.107.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2525f79400, cur 1566470175 expire 1566470025 last 1566469948 Aug 22 03:36:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 38516069-da41-9b1a-5b22-4b6fc1dfa003 (at 10.9.107.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34fdba4c00, cur 1566470404 expire 1566470254 last 1566470177 Aug 22 03:40:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 22 03:42:25 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 22 03:42:30 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 22 03:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f914310c-7825-8c6a-2b04-354707ee5046 (at 10.9.113.3@o2ib4) reconnecting Aug 22 03:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 8651a829-1584-35b1-6264-26a8d5433bb6 (at 10.9.113.3@o2ib4) Aug 22 03:42:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:42:32 fir-md1-s1 kernel: LustreError: 46510:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f266bfe8c50 x1638813984586032/t0(0) o4->f914310c-7825-8c6a-2b04-354707ee5046@10.9.113.3@o2ib4:8/0 lens 488/448 e 0 to 0 dl 1566470558 ref 1 fl Interpret:/0/0 rc 0/0 Aug 22 03:42:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with f914310c-7825-8c6a-2b04-354707ee5046 (at 10.9.113.3@o2ib4), client will retry: rc = -110 Aug 22 03:42:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 03:42:35 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 22 03:42:35 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 22 03:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 923b3b39-c45c-b3ff-a6cb-68a2326b052e (at 10.9.101.52@o2ib4) reconnecting Aug 22 03:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1f535bab-8d90-0e5f-dc54-e753eb6b1dbd (at 10.9.101.52@o2ib4) Aug 22 03:42:40 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 22 03:42:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1a5994ed-f702-43b8-5d0a-573a3a27bb32 (at 10.9.107.32@o2ib4) reconnecting Aug 22 03:42:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 03:42:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 70f54133-ffd4-a498-2383-bec785fc41ac (at 10.9.107.32@o2ib4) Aug 22 03:42:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 03:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ed5af6cd-4904-80f7-1122-ee701add6526 (at 10.9.108.54@o2ib4) reconnecting Aug 22 03:43:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1b6e684d-4e6e-552d-c96f-77964debfd62 (at 10.9.108.54@o2ib4) Aug 22 03:44:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 760ddeee-c61c-91e6-e7c0-149e515bdc27 (at 10.8.8.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec5398400, cur 1566470643 expire 1566470493 last 1566470416 Aug 22 03:44:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:06:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2efd71b8-9048-a8c6-8d05-137bd19b5501 (at 10.9.0.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0f0ab07000, cur 1566472013 expire 1566471863 last 1566471786 Aug 22 04:06:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:07:12 fir-md1-s1 kernel: LustreError: 20384:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f347604dd00 x1636776572444640/t0(0) o104->fir-MDT0002@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 22 04:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 71ebd024-40a3-334b-d106-6a0e2d0bffbe (at 10.8.27.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f221557e800, cur 1566472501 expire 1566472351 last 1566472274 Aug 22 04:15:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:16:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 86f9cd29-c493-e920-8336-c19de9946cf3 (at 10.9.107.25@o2ib4) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f3505fe5000, cur 1566472577 expire 1566472427 last 1566472370 Aug 22 04:16:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:30:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ddf62e2f-11a1-5fed-4797-b6a3a429c9aa (at 10.9.0.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f13b4985800, cur 1566473415 expire 1566473265 last 1566473188 Aug 22 04:30:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f3b73f80-5edd-b2a2-a7a2-f0eb0f74bb77 (at 10.9.102.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252272f400, cur 1566473753 expire 1566473603 last 1566473526 Aug 22 04:35:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:39:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bc5c6181-e3bc-e6aa-f314-551476a23754 (at 10.9.101.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252eb41000, cur 1566473969 expire 1566473819 last 1566473742 Aug 22 04:39:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 04:39:32 fir-md1-s1 kernel: LustreError: 20384:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f28b7572d00 x1636776580488352/t0(0) o104->fir-MDT0000@10.9.101.7@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 22 05:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6528a72c-7dbf-d506-86e5-e12b1d6e7573 (at 10.8.15.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2dfc4ee000, cur 1566476101 expire 1566475951 last 1566475874 Aug 22 05:15:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 05:16:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fdb866e1-5d8c-90ec-89cb-b6117c44bf12 (at 10.9.0.2@o2ib4) in 157 seconds. I think it's dead, and I am evicting it. exp ffff8f16b2fe0000, cur 1566476177 expire 1566476027 last 1566476020 Aug 22 05:16:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 05:17:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fdb866e1-5d8c-90ec-89cb-b6117c44bf12 (at 10.9.0.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f265e624c00, cur 1566476247 expire 1566476097 last 1566476020 Aug 22 05:31:02 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.35@o2ib4 arrived at 1566477062 with bad export cookie 6746083116478776263 Aug 22 05:31:02 fir-md1-s1 kernel: LustreError: 25084:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 92 previous similar messages Aug 22 05:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a97cc4f6-6e92-7669-8f75-f73e64eb3df2 (at 10.9.108.35@o2ib4) Aug 22 05:34:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98a2e267-7ec4-26e6-8e49-234410a6b030 (at 10.9.108.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4502967c00, cur 1566477289 expire 1566477139 last 1566477062 Aug 22 05:34:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 06:21:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cf8efa88-afdc-2149-097e-669ccdae8c0c (at 10.9.102.39@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4519118000, cur 1566480106 expire 1566479956 last 1566479879 Aug 22 06:59:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42800284-789e-e9cc-0ebd-dbacb154f6ac (at 10.9.107.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453bdbf800, cur 1566482380 expire 1566482230 last 1566482153 Aug 22 06:59:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 07:28:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96f77fe0-d0c2-629d-bb62-dcf685e7e47d (at 10.9.0.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0b93c54800, cur 1566484104 expire 1566483954 last 1566483877 Aug 22 07:28:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 07:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 96f77fe0-d0c2-629d-bb62-dcf685e7e47d (at 10.9.0.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2fb52cec00, cur 1566484105 expire 1566483955 last 1566483878 Aug 22 07:28:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 07:28:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a355d35a-0825-2e84-0717-f583af7beec3 (at 10.9.0.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1bcd784c00, cur 1566484113 expire 1566483963 last 1566483886 Aug 22 07:28:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 07:52:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3bd18d8-3db5-df1c-b07f-336571ebc30a (at 10.9.0.2@o2ib4) Aug 22 07:55:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 22 07:55:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 08:08:01 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566486474/real 1566486474] req@ffff8f2ae37abc00 x1636776619349616/t0(0) o104->fir-MDT0000@10.9.101.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566486481 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 22 08:08:01 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 22 08:08:09 fir-md1-s1 kernel: Lustre: 10144:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2ffcbc1800 x1642167864699456/t0(0) o36->40ebe744-82bc-a30e-9343-50eaabccaf84@10.9.0.64@o2ib4:14/0 lens 592/2888 e 1 to 0 dl 1566486494 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 08:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 40ebe744-82bc-a30e-9343-50eaabccaf84 (at 10.9.0.64@o2ib4) reconnecting Aug 22 08:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 22 08:08:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 08:08:22 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566486495/real 1566486495] req@ffff8f2ae37abc00 x1636776619349616/t0(0) o104->fir-MDT0000@10.9.101.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566486502 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 22 08:08:22 fir-md1-s1 kernel: Lustre: 21452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 22 08:08:27 fir-md1-s1 kernel: Lustre: 27318:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f32414c5700 x1631674765725840/t0(0) o36->a3f3d2bc-d481-b26b-e6da-3afa952c1e68@10.9.108.9@o2ib4:2/0 lens 552/2888 e 1 to 0 dl 1566486512 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 08:08:29 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.3@o2ib4) failed to reply to blocking AST (req@ffff8f2ae37abc00 x1636776619349616 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f33531eaf40/0x5d9ee6db0e41510f lrc: 4/0,0 mode: PR/PR res: [0x20002a121:0xf383:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.101.3@o2ib4 remote: 0xe05d94fabcead63a expref: 88 pid: 23581 timeout: 5601591 lvb_type: 0 Aug 22 08:08:29 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.101.3@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 22 08:08:29 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.101.3@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f33531eaf40/0x5d9ee6db0e41510f lrc: 3/0,0 mode: PR/PR res: [0x20002a121:0xf383:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.101.3@o2ib4 remote: 0xe05d94fabcead63a expref: 89 pid: 23581 timeout: 0 lvb_type: 0 Aug 22 08:11:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7e2f1b77-e605-f279-b45d-e428b3d96daf (at 10.9.101.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f42ce000, cur 1566486661 expire 1566486511 last 1566486434 Aug 22 08:11:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 08:12:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7c13660c-b743-3f11-23de-3221a2e02958 (at 10.9.106.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45361b8800, cur 1566486770 expire 1566486620 last 1566486543 Aug 22 08:12:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 08:37:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 22 08:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 61e04965-115a-09f2-6310-7831ec602a59 (at 10.9.0.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1619942c00, cur 1566489366 expire 1566489216 last 1566489139 Aug 22 08:56:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 09:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 268c2f86-5cd0-78bf-a064-50cf69aa4202 (at 10.8.7.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ee3d11c00, cur 1566492903 expire 1566492753 last 1566492676 Aug 22 09:55:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 09:56:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ea1cdf3e-c1a9-c826-73a8-fd54bacafbe5 (at 10.8.7.4@o2ib6) Aug 22 09:56:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 10:53:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 09178838-ce52-4043-1e0e-21a0c9717f63 (at 10.9.106.52@o2ib4) Aug 22 10:53:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 11:29:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 22 11:29:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 11:35:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ea1cdf3e-c1a9-c826-73a8-fd54bacafbe5 (at 10.8.7.4@o2ib6) Aug 22 11:35:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:13:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00934882-5919-cf7b-4596-d940346461e2 (at 10.9.106.35@o2ib4) Aug 22 12:13:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:13:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 427422bc-2dc5-3613-8742-f2dc7e69d571 (at 10.9.106.27@o2ib4) Aug 22 12:13:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:13:52 fir-md1-s1 kernel: Lustre: 20553:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566501225/real 1566501225] req@ffff8f4089a75400 x1636776696497744/t0(0) o104->fir-MDT0000@10.9.102.66@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566501232 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 22 12:13:52 fir-md1-s1 kernel: Lustre: 20553:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 22 12:13:59 fir-md1-s1 kernel: Lustre: 20553:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566501232/real 1566501232] req@ffff8f4089a75400 x1636776696497744/t0(0) o104->fir-MDT0000@10.9.102.66@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566501239 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 22 12:14:00 fir-md1-s1 kernel: Lustre: 23680:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f419435c800 x1631593152114880/t0(0) o36->616a4c10-d92e-f14d-6de7-aaa055164380@10.9.102.51@o2ib4:5/0 lens 488/3152 e 1 to 0 dl 1566501245 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 12:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 616a4c10-d92e-f14d-6de7-aaa055164380 (at 10.9.102.51@o2ib4) reconnecting Aug 22 12:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 68156f2b-71f5-36b7-b4a4-c1e98fddd93b (at 10.9.102.51@o2ib4) Aug 22 12:14:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:14:08 fir-md1-s1 kernel: Lustre: 20726:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566501241/real 1566501241] req@ffff8f275d34aa00 x1636776696539136/t0(0) o104->fir-MDT0000@10.9.102.66@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566501248 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 22 12:14:08 fir-md1-s1 kernel: Lustre: 20726:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 22 12:14:10 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f077267bc00 x1635099101246016/t0(0) o36->d6c95989-a33e-02cc-37c5-1e98ca81c68c@10.9.105.2@o2ib4:14/0 lens 488/3152 e 1 to 0 dl 1566501254 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 12:14:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d6c95989-a33e-02cc-37c5-1e98ca81c68c (at 10.9.105.2@o2ib4) reconnecting Aug 22 12:14:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 05b3043c-4908-9cdc-059e-e4079a4cb50a (at 10.9.105.2@o2ib4) Aug 22 12:14:17 fir-md1-s1 kernel: Lustre: 23680:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f453574f200 x1631590103474832/t0(0) o36->7f822e06-cec1-e17d-9d94-250ae612cdd0@10.9.102.54@o2ib4:22/0 lens 488/3152 e 1 to 0 dl 1566501262 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 12:14:20 fir-md1-s1 kernel: LustreError: 20553:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.102.66@o2ib4) failed to reply to blocking AST (req@ffff8f4089a75400 x1636776696497744 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f2309253cc0/0x5d9ee6db5aaf943c lrc: 4/0,0 mode: PR/PR res: [0x200029a4c:0xc69:0x0].0x0 bits 0x1b/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.9.102.66@o2ib4 remote: 0x20550925d3e9a097 expref: 1478 pid: 23581 timeout: 5616342 lvb_type: 0 Aug 22 12:14:20 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.102.66@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 22 12:14:20 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.102.66@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f2309253cc0/0x5d9ee6db5aaf943c lrc: 3/0,0 mode: PR/PR res: [0x200029a4c:0xc69:0x0].0x0 bits 0x1b/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.9.102.66@o2ib4 remote: 0x20550925d3e9a097 expref: 1479 pid: 23581 timeout: 0 lvb_type: 0 Aug 22 12:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 531b277b-5034-b534-741c-dbf6d1c1c863 (at 10.9.102.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148df57000, cur 1566501416 expire 1566501266 last 1566501189 Aug 22 12:16:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:19:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6558db06-e84b-e314-9758-e1f758d5cd4e (at 10.9.107.14@o2ib4) Aug 22 12:19:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 28fe699e-59db-fbbf-130e-a39e745a03cb (at 10.9.106.20@o2ib4) Aug 22 12:19:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 22 12:20:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3013729c-f845-340f-996a-9bb9ee834075 (at 10.9.106.19@o2ib4) Aug 22 12:20:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:21:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5619c86a-15ea-6217-4295-ed04c724fa9d (at 10.9.106.31@o2ib4) Aug 22 12:21:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 22 12:31:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to eb063a87-3fc2-413e-e5a9-3ea270493202 (at 10.8.27.27@o2ib6) Aug 22 12:31:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:53:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b8b7cdf4-1073-650a-6269-b5bba9cefb37 (at 10.9.101.7@o2ib4) Aug 22 12:53:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:56:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 22 12:56:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 12:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4aede13-725b-9160-95c0-802d1d063790 (at 10.9.106.36@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2b0c206800, cur 1566503998 expire 1566503848 last 1566503771 Aug 22 12:59:58 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 22 13:05:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e1008ff1-7911-4d3d-cd72-11efd094b730 (at 10.8.8.30@o2ib6) Aug 22 13:05:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 22 14:27:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 22 14:27:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 14:30:09 fir-md1-s1 kernel: Lustre: 23571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566509402/real 1566509402] req@ffff8f1423263900 x1636776780116144/t0(0) o104->fir-MDT0000@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566509409 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 22 14:30:09 fir-md1-s1 kernel: Lustre: 23571:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Aug 22 14:30:13 fir-md1-s1 kernel: Lustre: 23657:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566509406/real 1566509406] req@ffff8f2d9b3fb600 x1636776780143744/t0(0) o104->fir-MDT0000@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566509413 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 22 14:30:17 fir-md1-s1 kernel: Lustre: 23589:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ff2a68300 x1641985991554752/t0(0) o101->78efd63b-7105-c2cc-db45-dddfecbe5e5c@10.9.112.16@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1566509422 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 14:30:20 fir-md1-s1 kernel: Lustre: 23657:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566509413/real 1566509413] req@ffff8f2d9b3fb600 x1636776780143744/t0(0) o104->fir-MDT0000@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566509420 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 22 14:30:20 fir-md1-s1 kernel: Lustre: 23657:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 22 14:30:21 fir-md1-s1 kernel: Lustre: 23749:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0feee95d00 x1633742527495264/t0(0) o101->00a6bf4a-1a11-675b-07eb-2392e93c70c7@10.8.29.8@o2ib6:26/0 lens 480/568 e 1 to 0 dl 1566509426 ref 2 fl Interpret:/0/0 rc 0/0 Aug 22 14:30:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 78efd63b-7105-c2cc-db45-dddfecbe5e5c (at 10.9.112.16@o2ib4) reconnecting Aug 22 14:30:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 286d4aef-dd39-033a-885a-1b2f68dad8ee (at 10.9.112.16@o2ib4) Aug 22 14:30:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 14:30:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 00a6bf4a-1a11-675b-07eb-2392e93c70c7 (at 10.8.29.8@o2ib6) reconnecting Aug 22 14:30:30 fir-md1-s1 kernel: Lustre: 23571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566509423/real 1566509423] req@ffff8f1423263900 x1636776780116144/t0(0) o104->fir-MDT0000@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566509430 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 22 14:30:30 fir-md1-s1 kernel: Lustre: 23571:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 22 14:30:37 fir-md1-s1 kernel: LustreError: 23571:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.112.15@o2ib4) failed to reply to blocking AST (req@ffff8f1423263900 x1636776780116144 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1600167bc0/0x5d9ee6db7af72163 lrc: 4/0,0 mode: PR/PR res: [0x20002a1ef:0x162a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.9.112.15@o2ib4 remote: 0x7dfbbdf2a397ce13 expref: 6087 pid: 20734 timeout: 5624519 lvb_type: 0 Aug 22 14:30:37 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.112.15@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 22 14:30:37 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.112.15@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f1600167bc0/0x5d9ee6db7af72163 lrc: 3/0,0 mode: PR/PR res: [0x20002a1ef:0x162a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.9.112.15@o2ib4 remote: 0x7dfbbdf2a397ce13 expref: 6088 pid: 20734 timeout: 0 lvb_type: 0 Aug 22 14:33:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d93e5856-fa14-188b-47c0-360c358f8770 (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1d4b5dd400, cur 1566509612 expire 1566509462 last 1566509385 Aug 22 14:33:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 14:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a97cc4f6-6e92-7669-8f75-f73e64eb3df2 (at 10.9.108.35@o2ib4) Aug 22 14:42:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 14:47:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98a2e267-7ec4-26e6-8e49-234410a6b030 (at 10.9.108.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e3ee0f000, cur 1566510449 expire 1566510299 last 1566510222 Aug 22 14:47:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 22 14:58:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2cfafa70-b330-11ee-9222-b6c3a49b44eb (at 10.9.106.23@o2ib4) Aug 22 15:10:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6558db06-e84b-e314-9758-e1f758d5cd4e (at 10.9.107.14@o2ib4) Aug 22 15:10:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 15:11:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 427422bc-2dc5-3613-8742-f2dc7e69d571 (at 10.9.106.27@o2ib4) Aug 22 15:11:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 15:11:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 00934882-5919-cf7b-4596-d940346461e2 (at 10.9.106.35@o2ib4) Aug 22 15:11:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 17:24:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 17da4e75-34c6-9611-64b7-dd7bf0ceb9f0 (at 10.9.106.40@o2ib4) Aug 22 17:24:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 22 18:25:41 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 22 18:25:41 fir-md1-s1 kernel: Lustre: 20571:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 22 18:26:29 fir-md1-s1 kernel: Lustre: 10588:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 22 18:48:42 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 22 18:48:42 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 22 18:50:42 fir-md1-s1 kernel: Lustre: 27319:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 22 19:00:30 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 22 19:00:30 fir-md1-s1 kernel: Lustre: 21420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 22 19:15:54 fir-md1-s1 kernel: Lustre: 23560:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 23 03:22:29 fir-md1-s1 kernel: LustreError: 31016:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.35@o2ib4 arrived at 1566555749 with bad export cookie 6746083122229783700 Aug 23 03:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a97cc4f6-6e92-7669-8f75-f73e64eb3df2 (at 10.9.108.35@o2ib4) Aug 23 03:22:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 03:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98a2e267-7ec4-26e6-8e49-234410a6b030 (at 10.9.108.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0ba345f400, cur 1566555976 expire 1566555826 last 1566555749 Aug 23 09:12:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 21d35c87-c27c-bca9-1f9e-267ed20d4b9a (at 10.8.0.3@o2ib6) Aug 23 09:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bb17aca1-57d8-f36a-a79b-bcdcd36ec002 (at 10.8.18.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16a37ccc00, cur 1566577036 expire 1566576886 last 1566576809 Aug 23 09:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0895fc36-b0a5-f2ce-fc2e-959f841648ef (at 10.8.19.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ab2063c00, cur 1566578064 expire 1566577914 last 1566577837 Aug 23 09:34:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 23 09:35:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5e90b32f-f588-dfef-191f-169796896533 (at 10.8.11.36@o2ib6) Aug 23 09:35:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:35:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.36@o2ib6, removing former export from same NID Aug 23 09:35:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5e90b32f-f588-dfef-191f-169796896533 (at 10.8.11.36@o2ib6) Aug 23 09:36:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7e64fcba-461c-e286-f780-b934c678bb43 (at 10.8.10.21@o2ib6) Aug 23 09:36:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:36:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2f7289d2-d04f-dbbb-901b-5cb0d8c60bd1 (at 10.8.11.23@o2ib6) Aug 23 09:36:31 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 23 09:36:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bce6b9fa-7160-9112-2bc7-9425baad266a (at 10.8.10.11@o2ib6) Aug 23 09:37:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 714557b1-cf03-1d76-a357-b6ca0e0916e0 (at 10.8.11.15@o2ib6) Aug 23 09:37:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:43:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.24@o2ib4) Aug 23 09:43:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:45:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 45b434ea-97b5-6b42-37ec-7e634e39fe74 (at 10.9.106.71@o2ib4) Aug 23 09:45:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:46:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2db05c2c-17f5-0add-9e07-554dfdaaa304 (at 10.9.106.25@o2ib4) Aug 23 09:46:23 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 23 09:53:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c77683c2-9a5d-9ea9-6036-59b9504cf12b (at 10.9.101.3@o2ib4) Aug 23 09:53:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 09:57:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 699d81f5-ca90-8c24-9fb5-0969e297b451 (at 10.8.18.20@o2ib6) Aug 23 09:57:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 23 10:07:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 685cfb0c-586a-71ac-1cab-5ddabdd9e249 (at 10.8.11.27@o2ib6) Aug 23 10:07:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 10:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c60787b3-83f3-623a-577c-300b22abaee7 (at 10.9.103.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f20c74db800, cur 1566580189 expire 1566580039 last 1566579962 Aug 23 10:09:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 10:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5316ccca-e14a-d86e-7a5a-aea56fb554aa (at 10.8.11.15@o2ib6) in 183 seconds. I think it's dead, and I am evicting it. exp ffff8f28eb2fa800, cur 1566580265 expire 1566580115 last 1566580082 Aug 23 10:11:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 10:11:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 069658d9-e634-914f-4dd5-a43c8538d52d (at 10.8.11.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33ab686800, cur 1566580309 expire 1566580159 last 1566580082 Aug 23 10:11:49 fir-md1-s1 kernel: Lustre: Skipped 55 previous similar messages Aug 23 10:19:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1ba6ce6a-96d2-9d97-e670-f3b26c5cbf04 (at 10.9.109.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b3cfc00, cur 1566580771 expire 1566580621 last 1566580544 Aug 23 10:19:31 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 23 10:19:34 fir-md1-s1 kernel: Lustre: 23648:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 10:23:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e154d66f-ead9-6f0d-f306-8e351668402d (at 10.9.102.1@o2ib4) Aug 23 10:23:12 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Aug 23 10:41:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 29e229ef-0b7d-e0ce-48dd-1c614dad7928 (at 10.9.112.15@o2ib4) Aug 23 10:41:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 23 10:55:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 450ef4e9-4778-bb1a-e7eb-47a1da87d4c0 (at 10.9.101.60@o2ib4) Aug 23 10:55:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 23 11:05:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 13b1d176-b53a-96dd-927a-0f7738938e51 (at 10.8.10.1@o2ib6) Aug 23 11:05:55 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 23 11:22:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c868681d-14c0-a2d1-2354-21f32e094f44 (at 10.8.25.23@o2ib6) Aug 23 11:22:10 fir-md1-s1 kernel: Lustre: Skipped 206 previous similar messages Aug 23 13:31:12 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592265/real 1566592265] req@ffff8f11bf1c0f00 x1636777835208544/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592272 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 23 13:31:12 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 23 13:31:19 fir-md1-s1 kernel: Lustre: 10304:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592272/real 1566592272] req@ffff8f0e19d3fb00 x1636777835208784/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592279 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:31:19 fir-md1-s1 kernel: Lustre: 10304:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 23 13:31:20 fir-md1-s1 kernel: Lustre: 21670:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0ebbd04b00 x1642448131563376/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:25/0 lens 480/568 e 1 to 0 dl 1566592285 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 13:31:26 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592279/real 1566592279] req@ffff8f0707759800 x1636777835208896/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592286 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 13:31:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 13:31:40 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592293/real 1566592293] req@ffff8f0707759800 x1636777835208896/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592300 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:31:40 fir-md1-s1 kernel: Lustre: 23560:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 23 13:31:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:32:01 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592314/real 1566592314] req@ffff8f11bf1c0f00 x1636777835208544/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592321 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:32:01 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 23 13:32:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:32:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:32:36 fir-md1-s1 kernel: Lustre: 10304:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592349/real 1566592349] req@ffff8f0e19d3fb00 x1636777835208784/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592356 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:32:36 fir-md1-s1 kernel: Lustre: 10304:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Aug 23 13:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:32:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 13:32:51 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 13:33:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 13:33:46 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566592419/real 1566592419] req@ffff8f11bf1c0f00 x1636777835208544/t0(0) o106->fir-MDT0002@10.9.112.14@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566592426 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 13:33:46 fir-md1-s1 kernel: Lustre: 21417:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 28 previous similar messages Aug 23 13:34:26 fir-md1-s1 kernel: LNet: Service thread pid 21417 was inactive for 200.72s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 23 13:34:26 fir-md1-s1 kernel: Pid: 21417, comm: mdt00_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 23 13:34:26 fir-md1-s1 kernel: Call Trace: Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 23 13:34:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 23 13:34:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 23 13:34:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566592466.21417 Aug 23 13:34:26 fir-md1-s1 kernel: Pid: 10304, comm: mdt00_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 23 13:34:26 fir-md1-s1 kernel: Call Trace: Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 23 13:34:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 23 13:34:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 23 13:34:26 fir-md1-s1 kernel: Pid: 23560, comm: mdt00_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 23 13:34:26 fir-md1-s1 kernel: Call Trace: Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 23 13:34:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 23 13:34:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 23 13:34:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 23 13:34:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9bacdbac-38ad-ff82-227e-bc06d5aa6bed (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f097a6e9c00, cur 1566592482 expire 1566592332 last 1566592255 Aug 23 13:34:42 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 23 13:34:42 fir-md1-s1 kernel: Lustre: 23560:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (147:70s); client may timeout. req@ffff8f1378abcb00 x1642448131565744/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:25/0 lens 480/536 e 1 to 0 dl 1566592412 ref 1 fl Complete:/0/0 rc 301/301 Aug 23 13:34:42 fir-md1-s1 kernel: LNet: Service thread pid 10304 completed after 216.79s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 23 13:34:42 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 23 13:34:42 fir-md1-s1 kernel: Lustre: 23560:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 23 13:56:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98985c45-beb0-4374-1294-1aa55f308170 (at 10.9.101.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19cfc2cc00, cur 1566593777 expire 1566593627 last 1566593550 Aug 23 13:56:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 14:24:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 450ef4e9-4778-bb1a-e7eb-47a1da87d4c0 (at 10.9.101.60@o2ib4) Aug 23 14:24:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 14:30:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9c358326-94eb-7ea7-c880-af837f4a287d (at 10.9.101.59@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4399950800, cur 1566595858 expire 1566595708 last 1566595631 Aug 23 14:30:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 14:43:37 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 14:58:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5d6eb12a-986d-3a49-d3ca-602d8bd21b2a (at 10.9.101.59@o2ib4) Aug 23 14:58:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 15:28:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 23 15:28:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 15:46:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2a3aeab0-1050-b9bb-39ea-0e1d8517c964 (at 10.9.101.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0899b3b000, cur 1566600399 expire 1566600249 last 1566600172 Aug 23 15:46:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 23 15:46:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b8b7cdf4-1073-650a-6269-b5bba9cefb37 (at 10.9.101.7@o2ib4) Aug 23 15:46:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 16:04:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 50004cb6-ccab-c6db-a14f-19178b9d9f0e (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2130ff4400, cur 1566601477 expire 1566601327 last 1566601250 Aug 23 16:04:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 16:04:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 50004cb6-ccab-c6db-a14f-19178b9d9f0e (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f37202ce400, cur 1566601480 expire 1566601330 last 1566601253 Aug 23 16:04:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 16:20:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 44538503-d08c-cc53-6d76-5a2aae25ceaa (at 10.8.0.68@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1e2abf3800, cur 1566602400 expire 1566602250 last 1566602173 Aug 23 17:03:18 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 17:05:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 23 17:05:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 17:20:23 fir-md1-s1 kernel: Lustre: 21369:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:25:54 fir-md1-s1 kernel: Lustre: 23589:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:26:57 fir-md1-s1 kernel: Lustre: 23689:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:39:20 fir-md1-s1 kernel: Lustre: 27321:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:43:21 fir-md1-s1 kernel: Lustre: 20459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:45:07 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:45:07 fir-md1-s1 kernel: Lustre: 21419:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 23 17:53:03 fir-md1-s1 kernel: Lustre: 23560:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 17:56:31 fir-md1-s1 kernel: Lustre: 20732:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f69b3f200 x1642433483925056/t0(0) o101->15ab5365-c72f-797a-6b04-708c42fac9fc@10.9.110.22@o2ib4:6/0 lens 480/568 e 1 to 0 dl 1566608196 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 17:56:31 fir-md1-s1 kernel: Lustre: 20732:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 23 17:56:31 fir-md1-s1 kernel: Lustre: 21128:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3932729200 x1642435707145536/t0(0) o101->4c34a5f1-b39e-2aad-9006-ed9c087022ce@10.9.110.18@o2ib4:6/0 lens 480/568 e 1 to 0 dl 1566608196 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 17:56:32 fir-md1-s1 kernel: Lustre: 21074:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3fa619d700 x1640054582261952/t0(0) o101->abc67aa0-7dc1-590d-7866-f6617627f2fb@10.9.107.63@o2ib4:7/0 lens 480/568 e 1 to 0 dl 1566608197 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4c34a5f1-b39e-2aad-9006-ed9c087022ce (at 10.9.110.18@o2ib4) reconnecting Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 961908c8-c2fb-1aff-1ef3-8d07053495ce (at 10.9.110.22@o2ib4) Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2f7f82ee-90c5-d2c3-da56-f04f86d3c2a5 (at 10.9.107.63@o2ib4) Aug 23 17:56:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 17:56:40 fir-md1-s1 kernel: Lustre: 23708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f070c318f00 x1631550512829584/t0(0) o101->07289107-15cf-b70a-a8d8-67d0d32bbec1@10.9.108.29@o2ib4:15/0 lens 480/568 e 0 to 0 dl 1566608205 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 17:56:44 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.108.25@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1b83a3d100/0x5d9ee6ddfa225990 lrc: 3/0,0 mode: PW/PW res: [0x2c002c4ff:0xde7:0x0].0x0 bits 0x40/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.108.25@o2ib4 remote: 0xa65688fd2e80279b expref: 494 pid: 22286 timeout: 5723264 lvb_type: 0 Aug 23 17:56:46 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.107.47@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f323e9fe300/0x5d9ee6ddfa4c09b8 lrc: 3/0,0 mode: PW/PW res: [0x2c002bdeb:0xd368:0x0].0x0 bits 0x40/0x0 rrc: 32 type: IBT flags: 0x60200400000020 nid: 10.9.107.47@o2ib4 remote: 0xf059616fce911363 expref: 507 pid: 20463 timeout: 5723266 lvb_type: 0 Aug 23 17:56:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f5f92765-342a-418f-7600-2b9a7cbc8f11 (at 10.9.108.25@o2ib4) Aug 23 17:56:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b03523f3-7393-4682-6529-e841828fdc86 (at 10.9.103.39@o2ib4) reconnecting Aug 23 17:56:49 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 17:56:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5faaffa4-b794-a808-9491-b5e7b4312905 (at 10.9.107.47@o2ib4) Aug 23 17:56:56 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 17:56:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 15ab5365-c72f-797a-6b04-708c42fac9fc (at 10.9.110.22@o2ib4) reconnecting Aug 23 18:02:46 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:04:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 23 18:04:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 18:05:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.20@o2ib6, removing former export from same NID Aug 23 18:06:09 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:06:16 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:06:16 fir-md1-s1 kernel: Lustre: 23594:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 23 18:06:44 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:06:44 fir-md1-s1 kernel: Lustre: 23703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 23 18:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 30b46016-5e0f-ddd3-494b-68e306e1f0e9 (at 10.9.104.30@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f41980f2400, cur 1566608830 expire 1566608680 last 1566608603 Aug 23 18:07:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 18:08:31 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:08:31 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Aug 23 18:10:50 fir-md1-s1 kernel: Lustre: 20738:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:13:19 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:13:19 fir-md1-s1 kernel: Lustre: 10506:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Aug 23 18:15:22 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:16:10 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:18:34 fir-md1-s1 kernel: Lustre: 23580:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:18:34 fir-md1-s1 kernel: Lustre: 23580:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 20 previous similar messages Aug 23 18:20:19 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 23 18:20:19 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 23 18:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b50adf7b-1eb0-aced-6fea-489b596a7b56 (at 10.9.101.50@o2ib4) reconnecting Aug 23 18:20:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 83acfeb9-f28f-d51c-b3c4-69b2ea2161d2 (at 10.9.101.8@o2ib4) Aug 23 18:20:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 18:20:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 23 18:20:43 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:25:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 714557b1-cf03-1d76-a357-b6ca0e0916e0 (at 10.8.11.15@o2ib6) Aug 23 18:25:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 18:26:20 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:28:07 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:28:07 fir-md1-s1 kernel: Lustre: 21311:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages Aug 23 18:37:56 fir-md1-s1 kernel: Lustre: 35231:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f6c1fe050 x1638881798674864/t0(0) o4->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:1/0 lens 488/448 e 1 to 0 dl 1566610681 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 18:37:56 fir-md1-s1 kernel: Lustre: 35231:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Aug 23 18:38:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2f627314-68e3-35d2-70d7-0cd2604dd048 (at 10.9.115.4@o2ib4) reconnecting Aug 23 18:38:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4ad0ddc3-f24b-a3c1-14b7-f4027df128f0 (at 10.9.115.4@o2ib4) Aug 23 18:38:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 18:38:03 fir-md1-s1 kernel: Lustre: 49250:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff8f0f6c1fe050 x1638881798674864/t360452657621(0) o4->2f627314-68e3-35d2-70d7-0cd2604dd048@10.9.115.4@o2ib4:1/0 lens 488/416 e 1 to 0 dl 1566610681 ref 1 fl Complete:/0/0 rc 0/0 Aug 23 18:38:07 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:38:07 fir-md1-s1 kernel: Lustre: 23660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Aug 23 18:49:35 fir-md1-s1 kernel: LNetError: 20183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:49:50 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 18:49:50 fir-md1-s1 kernel: Lustre: 23599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Aug 23 18:58:02 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 18:58:35 fir-md1-s1 kernel: Lustre: 20726:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) is approaching maximum entry limit Aug 23 19:02:51 fir-md1-s1 kernel: Lustre: 23648:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 19:02:51 fir-md1-s1 kernel: Lustre: 23648:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Aug 23 19:05:39 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 23 19:10:44 fir-md1-s1 kernel: Lustre: 22281:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) is approaching maximum entry limit Aug 23 19:20:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_045: index 2: reach max htree level 2 Aug 23 19:20:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:20:59 fir-md1-s1 kernel: Lustre: 22283:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:03 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_092: index 2: reach max htree level 2 Aug 23 19:21:03 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:03 fir-md1-s1 kernel: Lustre: 97653:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:03 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_086: index 2: reach max htree level 2 Aug 23 19:21:03 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:05 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_034: index 2: reach max htree level 2 Aug 23 19:21:05 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:05 fir-md1-s1 kernel: Lustre: 10198:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:05 fir-md1-s1 kernel: Lustre: 10198:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 1 previous similar message Aug 23 19:21:06 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_102: index 2: reach max htree level 2 Aug 23 19:21:06 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:06 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_095: index 2: reach max htree level 2 Aug 23 19:21:06 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:07 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 23 19:21:07 fir-md1-s1 kernel: Lustre: 23633:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 23 19:21:14 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_049: index 2: reach max htree level 2 Aug 23 19:21:14 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:14 fir-md1-s1 kernel: Lustre: 22287:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:14 fir-md1-s1 kernel: Lustre: 22287:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 2 previous similar messages Aug 23 19:21:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_110: index 2: reach max htree level 2 Aug 23 19:21:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:18 fir-md1-s1 kernel: Lustre: 23736:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:22 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_078: index 2: reach max htree level 2 Aug 23 19:21:22 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:25 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_062: index 2: reach max htree level 2 Aug 23 19:21:25 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_073: index 2: reach max htree level 2 Aug 23 19:21:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:27 fir-md1-s1 kernel: Lustre: 23658:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:27 fir-md1-s1 kernel: Lustre: 23658:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 2 previous similar messages Aug 23 19:21:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_038: index 2: reach max htree level 2 Aug 23 19:21:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:33 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_028: index 2: reach max htree level 2 Aug 23 19:21:33 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:35 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_095: index 2: reach max htree level 2 Aug 23 19:21:35 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:36 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_063: index 2: reach max htree level 2 Aug 23 19:21:36 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:37 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_037: index 2: reach max htree level 2 Aug 23 19:21:37 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:39 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_005: index 2: reach max htree level 2 Aug 23 19:21:39 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_003: index 2: reach max htree level 2 Aug 23 19:21:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_079: index 2: reach max htree level 2 Aug 23 19:21:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:44 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_111: index 2: reach max htree level 2 Aug 23 19:21:44 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:44 fir-md1-s1 kernel: Lustre: 23737:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:21:44 fir-md1-s1 kernel: Lustre: 23737:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 8 previous similar messages Aug 23 19:21:45 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_055: index 2: reach max htree level 2 Aug 23 19:21:45 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_039: index 2: reach max htree level 2 Aug 23 19:21:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:54 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_062: index 2: reach max htree level 2 Aug 23 19:21:54 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:21:57 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_024: index 2: reach max htree level 2 Aug 23 19:21:57 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:03 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 23 19:22:05 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_023: index 2: reach max htree level 2 Aug 23 19:22:05 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_053: index 2: reach max htree level 2 Aug 23 19:22:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b731aa74-f761-f808-ac4e-60997bf2bd97 (at 10.9.101.51@o2ib4) reconnecting Aug 23 19:22:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c77683c2-9a5d-9ea9-6036-59b9504cf12b (at 10.9.101.3@o2ib4) Aug 23 19:22:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 19:22:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_079: index 2: reach max htree level 2 Aug 23 19:22:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_039: index 2: reach max htree level 2 Aug 23 19:22:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:14 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_015: index 2: reach max htree level 2 Aug 23 19:22:14 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:20 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_082: index 2: reach max htree level 2 Aug 23 19:22:20 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:20 fir-md1-s1 kernel: Lustre: 23612:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:22:20 fir-md1-s1 kernel: Lustre: 23612:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 9 previous similar messages Aug 23 19:22:21 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_000: index 2: reach max htree level 2 Aug 23 19:22:21 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:22 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_046: index 2: reach max htree level 2 Aug 23 19:22:22 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_015: index 2: reach max htree level 2 Aug 23 19:22:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:30 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_072: index 2: reach max htree level 2 Aug 23 19:22:30 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:38 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_038: index 2: reach max htree level 2 Aug 23 19:22:38 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_005: index 2: reach max htree level 2 Aug 23 19:22:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:45 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_073: index 2: reach max htree level 2 Aug 23 19:22:45 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_102: index 2: reach max htree level 2 Aug 23 19:22:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:52 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_055: index 2: reach max htree level 2 Aug 23 19:22:52 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_067: index 2: reach max htree level 2 Aug 23 19:22:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:22:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_041: index 2: reach max htree level 2 Aug 23 19:22:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_078: index 2: reach max htree level 2 Aug 23 19:23:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_015: index 2: reach max htree level 2 Aug 23 19:23:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_103: index 2: reach max htree level 2 Aug 23 19:23:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_021: index 2: reach max htree level 2 Aug 23 19:23:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:32 fir-md1-s1 kernel: Lustre: 21681:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:23:32 fir-md1-s1 kernel: Lustre: 21681:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 14 previous similar messages Aug 23 19:23:36 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_106: index 2: reach max htree level 2 Aug 23 19:23:36 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:38 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_015: index 2: reach max htree level 2 Aug 23 19:23:38 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:43 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_040: index 2: reach max htree level 2 Aug 23 19:23:43 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:50 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_001: index 2: reach max htree level 2 Aug 23 19:23:50 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:50 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_078: index 2: reach max htree level 2 Aug 23 19:23:50 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:23:57 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_005: index 2: reach max htree level 2 Aug 23 19:23:57 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_073: index 2: reach max htree level 2 Aug 23 19:24:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_041: index 2: reach max htree level 2 Aug 23 19:24:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_082: index 2: reach max htree level 2 Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_104: index 2: reach max htree level 2 Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_013: index 2: reach max htree level 2 Aug 23 19:24:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_102: index 2: reach max htree level 2 Aug 23 19:24:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:12 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_111: index 2: reach max htree level 2 Aug 23 19:24:12 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:15 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_077: index 2: reach max htree level 2 Aug 23 19:24:15 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:16 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_063: index 2: reach max htree level 2 Aug 23 19:24:16 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_012: index 2: reach max htree level 2 Aug 23 19:24:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_048: index 2: reach max htree level 2 Aug 23 19:24:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:20 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_090: index 2: reach max htree level 2 Aug 23 19:24:20 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:23 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_034: index 2: reach max htree level 2 Aug 23 19:24:23 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:26 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_012: index 2: reach max htree level 2 Aug 23 19:24:26 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_074: index 2: reach max htree level 2 Aug 23 19:24:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:28 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_073: index 2: reach max htree level 2 Aug 23 19:24:28 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_021: index 2: reach max htree level 2 Aug 23 19:24:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:39 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_015: index 2: reach max htree level 2 Aug 23 19:24:39 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:40 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_024: index 2: reach max htree level 2 Aug 23 19:24:40 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_037: index 2: reach max htree level 2 Aug 23 19:24:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_064: index 2: reach max htree level 2 Aug 23 19:24:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:47 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_020: index 2: reach max htree level 2 Aug 23 19:24:47 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:47 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_043: index 2: reach max htree level 2 Aug 23 19:24:47 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:55 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_021: index 2: reach max htree level 2 Aug 23 19:24:55 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_104: index 2: reach max htree level 2 Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_033: index 2: reach max htree level 2 Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_104: index 2: reach max htree level 2 Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_001: index 2: reach max htree level 2 Aug 23 19:24:56 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning: 6 callbacks suppressed Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_074: index 2: reach max htree level 2 Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_111: index 2: reach max htree level 2 Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_005: index 2: reach max htree level 2 Aug 23 19:25:01 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_090: index 2: reach max htree level 2 Aug 23 19:25:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_078: index 2: reach max htree level 2 Aug 23 19:25:07 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_021: index 2: reach max htree level 2 Aug 23 19:25:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_094: index 2: reach max htree level 2 Aug 23 19:25:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_103: index 2: reach max htree level 2 Aug 23 19:25:11 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_072: index 2: reach max htree level 2 Aug 23 19:25:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_093: index 2: reach max htree level 2 Aug 23 19:25:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:25 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_020: index 2: reach max htree level 2 Aug 23 19:25:25 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:28 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_058: index 2: reach max htree level 2 Aug 23 19:25:28 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_021: index 2: reach max htree level 2 Aug 23 19:25:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:31 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_053: index 2: reach max htree level 2 Aug 23 19:25:31 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:34 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_098: index 2: reach max htree level 2 Aug 23 19:25:34 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_015: index 2: reach max htree level 2 Aug 23 19:25:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:46 fir-md1-s1 kernel: Lustre: 20728:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:25:46 fir-md1-s1 kernel: Lustre: 20728:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 52 previous similar messages Aug 23 19:25:48 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_003: index 2: reach max htree level 2 Aug 23 19:25:48 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:25:58 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_062: index 2: reach max htree level 2 Aug 23 19:25:58 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_111: index 2: reach max htree level 2 Aug 23 19:26:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_037: index 2: reach max htree level 2 Aug 23 19:26:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_015: index 2: reach max htree level 2 Aug 23 19:26:04 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_090: index 2: reach max htree level 2 Aug 23 19:26:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_094: index 2: reach max htree level 2 Aug 23 19:26:09 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_086: index 2: reach max htree level 2 Aug 23 19:26:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_041: index 2: reach max htree level 2 Aug 23 19:26:17 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_110: index 2: reach max htree level 2 Aug 23 19:26:27 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_102: index 2: reach max htree level 2 Aug 23 19:26:29 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_111: index 2: reach max htree level 2 Aug 23 19:26:32 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:37 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_082: index 2: reach max htree level 2 Aug 23 19:26:37 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_067: index 2: reach max htree level 2 Aug 23 19:26:42 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_069: index 2: reach max htree level 2 Aug 23 19:26:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_073: index 2: reach max htree level 2 Aug 23 19:26:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_001: index 2: reach max htree level 2 Aug 23 19:26:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:26:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_111: index 2: reach max htree level 2 Aug 23 19:26:59 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_056: index 2: reach max htree level 2 Aug 23 19:27:02 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_098: index 2: reach max htree level 2 Aug 23 19:27:08 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:10 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_082: index 2: reach max htree level 2 Aug 23 19:27:10 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:26 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_004: index 2: reach max htree level 2 Aug 23 19:27:26 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:30 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_001: index 2: reach max htree level 2 Aug 23 19:27:30 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:35 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt01_037: index 2: reach max htree level 2 Aug 23 19:27:35 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_030: index 2: reach max htree level 2 Aug 23 19:27:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:27:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_045: index 2: reach max htree level 2 Aug 23 19:27:46 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:28:15 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt02_090: index 2: reach max htree level 2 Aug 23 19:28:15 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:28:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_111: index 2: reach max htree level 2 Aug 23 19:28:18 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:28:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_001: index 2: reach max htree level 2 Aug 23 19:28:41 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:29:16 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt00_096: index 2: reach max htree level 2 Aug 23 19:29:16 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:30:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_079: index 2: reach max htree level 2 Aug 23 19:30:53 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 19:30:53 fir-md1-s1 kernel: Lustre: 23670:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) fir-MDT0000: directory (inode: 9206683, FID: [0x200029a43:0x1723d:0x0]) has reached maximum entry limit Aug 23 19:30:53 fir-md1-s1 kernel: Lustre: 23670:0:(osd_handler.c:501:osd_ldiskfs_add_entry()) Skipped 30 previous similar messages Aug 23 19:32:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2618: inode 9206683: comm mdt03_067: index 2: reach max htree level 2 Aug 23 19:32:13 fir-md1-s1 kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry:2622: Large directory feature is not enabled on this filesystem Aug 23 20:28:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ba43b840-458e-2685-9882-541e09d19454 (at 10.8.10.35@o2ib6) Aug 23 20:28:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 23 20:28:21 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.10.35@o2ib6, removing former export from same NID Aug 23 20:28:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ba43b840-458e-2685-9882-541e09d19454 (at 10.8.10.35@o2ib6) Aug 23 20:32:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3e52b3a1-9f96-4efd-fbef-229691cf86bd (at 10.8.10.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25e24a3400, cur 1566617527 expire 1566617377 last 1566617300 Aug 23 20:32:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 20:34:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ba43b840-458e-2685-9882-541e09d19454 (at 10.8.10.35@o2ib6) Aug 23 21:32:07 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566621120/real 1566621120] req@ffff8f13e2a63300 x1636778407370512/t0(0) o106->fir-MDT0002@10.9.110.16@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566621127 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 23 21:32:07 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Aug 23 21:32:15 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0f73681200 x1642451337533520/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:20/0 lens 480/568 e 1 to 0 dl 1566621140 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 21:32:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:32:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 21:32:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:32:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 21:32:28 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566621141/real 1566621141] req@ffff8f13e2a63300 x1636778407370512/t0(0) o106->fir-MDT0002@10.9.110.16@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566621148 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 21:32:28 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 23 21:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:33:03 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566621176/real 1566621176] req@ffff8f13e2a63300 x1636778407370512/t0(0) o106->fir-MDT0002@10.9.110.16@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566621183 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 21:33:03 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 23 21:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:33:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:33:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:33:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:34:13 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566621246/real 1566621246] req@ffff8f13e2a63300 x1636778407370512/t0(0) o106->fir-MDT0002@10.9.110.16@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566621253 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 21:34:13 fir-md1-s1 kernel: Lustre: 23599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 23 21:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 23 21:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 23 21:35:20 fir-md1-s1 kernel: LNet: Service thread pid 23599 was inactive for 200.16s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 23 21:35:20 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 23 21:35:20 fir-md1-s1 kernel: Pid: 23599, comm: mdt00_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 23 21:35:20 fir-md1-s1 kernel: Call Trace: Aug 23 21:35:20 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 23 21:35:20 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 23 21:35:20 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 23 21:35:20 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 23 21:35:20 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 23 21:35:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 23 21:35:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 23 21:35:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 23 21:35:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566621320.23599 Aug 23 21:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 10a9d07a-e915-cfa0-1be8-1dc73bd119cc (at 10.9.110.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f24e798b400, cur 1566621326 expire 1566621176 last 1566621099 Aug 23 21:35:26 fir-md1-s1 kernel: LNet: Service thread pid 23599 completed after 205.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 23 21:35:26 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 23 21:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 10a9d07a-e915-cfa0-1be8-1dc73bd119cc (at 10.9.110.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3fb2973800, cur 1566621328 expire 1566621178 last 1566621101 Aug 23 21:35:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 21:36:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f0a31a0f-a827-fd9c-2fb2-5b3fa59ec935 (at 10.9.110.16@o2ib4) Aug 23 21:36:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 22:25:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to af4b0a05-ded4-a989-9a67-2b5ae37628a1 (at 10.8.11.2@o2ib6) Aug 23 22:25:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 22:54:03 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566626036/real 1566626036] req@ffff8f15c94b9b00 x1636778463003632/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566626043 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 23 22:54:03 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Aug 23 22:54:11 fir-md1-s1 kernel: Lustre: 22005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f241e1cbf00 x1631626564818736/t0(0) o36->09300796-1183-3575-4e70-90c873be0aeb@10.9.109.3@o2ib4:16/0 lens 504/2888 e 1 to 0 dl 1566626056 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 22:54:16 fir-md1-s1 kernel: Lustre: 21424:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f43b5b33600 x1635208428222960/t0(0) o101->e41fe368-f843-ecd2-7796-8b7c697cca0c@10.9.109.38@o2ib4:21/0 lens 576/3264 e 1 to 0 dl 1566626061 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 22:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 09300796-1183-3575-4e70-90c873be0aeb (at 10.9.109.3@o2ib4) reconnecting Aug 23 22:54:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 22:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to fbfce38d-c7b8-7688-1c0c-e31ac46fde6a (at 10.9.109.3@o2ib4) Aug 23 22:54:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 23 22:54:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e41fe368-f843-ecd2-7796-8b7c697cca0c (at 10.9.109.38@o2ib4) reconnecting Aug 23 22:54:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to dd0b0f6c-982c-24c6-36e1-96d435567074 (at 10.9.109.38@o2ib4) Aug 23 22:54:24 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1aff2f3c00 x1641510975974464/t0(0) o101->35d6a112-0638-6793-fa6f-01d5382812f6@10.9.109.30@o2ib4:29/0 lens 576/3264 e 0 to 0 dl 1566626069 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 22:54:24 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 23 22:54:24 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566626057/real 1566626057] req@ffff8f15c94b9b00 x1636778463003632/t0(0) o104->fir-MDT0000@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566626064 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 23 22:54:24 fir-md1-s1 kernel: Lustre: 21446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 23 22:54:26 fir-md1-s1 kernel: Lustre: 23602:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f05652f1800 x1635099809798128/t0(0) o101->5ca073fc-7dd7-aa9f-392c-0231b32bdfbc@10.9.109.29@o2ib4:1/0 lens 576/3264 e 0 to 0 dl 1566626071 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 22:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 9f83b261-6c42-e810-5a55-38c82409d8c5 (at 10.9.109.30@o2ib4) Aug 23 22:54:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 23 22:54:30 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f29886f5700 x1640808129014448/t0(0) o101->47628428-af30-5100-b884-9d904590e2a6@10.9.109.6@o2ib4:5/0 lens 576/3264 e 0 to 0 dl 1566626075 ref 2 fl Interpret:/0/0 rc 0/0 Aug 23 22:54:30 fir-md1-s1 kernel: Lustre: 23652:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 23 22:54:31 fir-md1-s1 kernel: LustreError: 21446:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.109.37@o2ib4) failed to reply to blocking AST (req@ffff8f15c94b9b00 x1636778463003632 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f317c92aac0/0x5d9ee6dfb592f1dc lrc: 4/0,0 mode: PR/PR res: [0x200029876:0x752:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0xf78a51c9f2e5ef3f expref: 271 pid: 10144 timeout: 5741153 lvb_type: 0 Aug 23 22:54:31 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.109.37@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 23 22:54:31 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.109.37@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f317c92aac0/0x5d9ee6dfb592f1dc lrc: 3/0,0 mode: PR/PR res: [0x200029876:0x752:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0xf78a51c9f2e5ef3f expref: 272 pid: 10144 timeout: 0 lvb_type: 0 Aug 23 22:57:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b691bdc1-56f0-4fa2-bd5b-5663cb01b5ba (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42050d0800, cur 1566626254 expire 1566626104 last 1566626027 Aug 23 22:57:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4f5e6aca-fb16-1d79-3cb8-5e0a1ab49b15 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f05656a3800, cur 1566626260 expire 1566626110 last 1566626033 Aug 24 03:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1d9bbb43-a6f6-8fcf-8416-e1652b096042 (at 10.9.112.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f128c973800, cur 1566643751 expire 1566643601 last 1566643524 Aug 24 03:50:35 fir-md1-s1 kernel: Lustre: 21368:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0a29a31b00 x1642453122406976/t0(0) o53->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:10/0 lens 304/4352 e 1 to 0 dl 1566643840 ref 2 fl Interpret:/0/0 rc 0/0 Aug 24 03:52:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 24 06:37:57 fir-md1-s1 kernel: Lustre: 23572:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 24 06:37:57 fir-md1-s1 kernel: Lustre: 23572:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 24 07:00:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 24 07:00:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 07:44:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0551f057-f1fe-3d10-9e95-a12b2c90e4d2 (at 10.9.108.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f148b201800, cur 1566657866 expire 1566657716 last 1566657639 Aug 24 07:44:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 09:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5c84eea4-71ee-5976-8270-60de4ab6fe5c (at 10.9.114.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f226bffb400, cur 1566663362 expire 1566663212 last 1566663135 Aug 24 09:16:02 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 24 09:18:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 24 09:18:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 10:37:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1deb63b5-6b0a-2f25-dd01-9f096e5953ea (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f30b4d5ac00, cur 1566668231 expire 1566668081 last 1566668004 Aug 24 10:37:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 10:37:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ad966fbe-f9d3-f04d-0aff-53e76dcaf407 (at 10.8.11.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0e8196bc00, cur 1566668239 expire 1566668089 last 1566668012 Aug 24 10:37:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 24 10:37:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1deb63b5-6b0a-2f25-dd01-9f096e5953ea (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2dc65c2800, cur 1566668243 expire 1566668093 last 1566668016 Aug 24 10:37:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 24 10:39:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 059c6e07-4514-c446-ee09-8d2aaba1f015 (at 10.8.11.22@o2ib6) Aug 24 10:39:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 10:39:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 24 10:39:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 11:27:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6b9f606c-f89b-328f-c20f-7138b35c7fca (at 10.8.11.28@o2ib6) Aug 24 11:27:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 11:27:11 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.28@o2ib6, removing former export from same NID Aug 24 11:27:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6b9f606c-f89b-328f-c20f-7138b35c7fca (at 10.8.11.28@o2ib6) Aug 24 11:33:15 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 11:33:15 fir-md1-s1 kernel: LNetError: 20191:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Aug 24 11:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 27ef9320-178f-53ac-b738-4bc2f228a23d (at 10.9.0.63@o2ib4) reconnecting Aug 24 11:33:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 11:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to ca62f9dd-676b-9343-5931-7cfc2e4cfe16 (at 10.9.0.63@o2ib4) Aug 24 11:33:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 11:37:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e154d66f-ead9-6f0d-f306-8e351668402d (at 10.9.102.1@o2ib4) Aug 24 11:39:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 24 11:39:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 11:51:03 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 11:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7119923-112f-182f-2423-969f307f707e (at 10.8.27.35@o2ib6) reconnecting Aug 24 11:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 19effcd6-8030-8ae1-d9d6-24266f7c8d3c (at 10.8.27.35@o2ib6) Aug 24 11:51:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 13:28:40 fir-md1-s1 kernel: Lustre: 23744:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f315a3ad400 x1641922946508880/t0(0) o101->f7119923-112f-182f-2423-969f307f707e@10.8.27.35@o2ib6:15/0 lens 480/568 e 1 to 0 dl 1566678525 ref 2 fl Interpret:/0/0 rc 0/0 Aug 24 14:37:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 14:37:12 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 24 14:37:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 24 14:37:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 24 14:37:21 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 14:37:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 24 14:37:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 24 16:29:36 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 24 17:23:11 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 17:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7119923-112f-182f-2423-969f307f707e (at 10.8.27.35@o2ib6) reconnecting Aug 24 17:23:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 19effcd6-8030-8ae1-d9d6-24266f7c8d3c (at 10.8.27.35@o2ib6) Aug 24 17:23:19 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 24 18:38:47 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 24 18:58:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bfb6e805-d5d9-30c1-c57c-f5c9b6f9d250 (at 10.9.103.41@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14dddd9c00, cur 1566698285 expire 1566698135 last 1566698058 Aug 24 18:58:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 24 19:03:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d37611a7-beff-4465-26a4-0cfd871d4dbe (at 10.9.103.41@o2ib4) Aug 24 19:07:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bfb6e805-d5d9-30c1-c57c-f5c9b6f9d250 (at 10.9.103.41@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3954894000, cur 1566698835 expire 1566698685 last 1566698608 Aug 24 19:07:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 19:08:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5ecd3339-79cd-a67e-2a5c-bb3ff2529a3c (at 10.8.27.10@o2ib6) Aug 24 22:55:44 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566712537/real 1566712537] req@ffff8f2e5917a700 x1636779403079216/t0(0) o104->fir-MDT0002@10.9.108.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566712544 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 24 22:55:44 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 24 22:55:51 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566712544/real 1566712544] req@ffff8f2e5917a700 x1636779403079216/t0(0) o104->fir-MDT0002@10.9.108.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566712551 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 24 22:55:52 fir-md1-s1 kernel: Lustre: 10582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2e61c16900 x1639475635684864/t0(0) o36->f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b@10.8.0.65@o2ib6:27/0 lens 504/448 e 1 to 0 dl 1566712557 ref 2 fl Interpret:/0/0 rc 0/0 Aug 24 22:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 24 22:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 24 22:55:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 24 22:56:05 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566712558/real 1566712558] req@ffff8f2e5917a700 x1636779403079216/t0(0) o104->fir-MDT0002@10.9.108.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566712565 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 24 22:56:05 fir-md1-s1 kernel: Lustre: 21415:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 24 22:56:12 fir-md1-s1 kernel: LustreError: 21415:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.108.1@o2ib4) failed to reply to blocking AST (req@ffff8f2e5917a700 x1636779403079216 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f0a088406c0/0x5d9ee6e24d302dc8 lrc: 4/0,0 mode: PR/PR res: [0x2c002bfd6:0x4fa:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.108.1@o2ib4 remote: 0xcd73547a05967127 expref: 48 pid: 23756 timeout: 5827654 lvb_type: 0 Aug 24 22:56:12 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.108.1@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 24 22:56:12 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.108.1@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0a088406c0/0x5d9ee6e24d302dc8 lrc: 3/0,0 mode: PR/PR res: [0x2c002bfd6:0x4fa:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.108.1@o2ib4 remote: 0xcd73547a05967127 expref: 49 pid: 23756 timeout: 0 lvb_type: 0 Aug 24 22:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5ebfebf6-2249-718b-6c56-049c2cb434a6 (at 10.9.108.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27190d9400, cur 1566712727 expire 1566712577 last 1566712500 Aug 24 23:04:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to dd475abb-9ad1-0d16-b44c-4ccabf372321 (at 10.9.108.1@o2ib4) Aug 25 01:25:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd1da6eb-2fea-e00e-466d-9521aaee5f09 (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f18115e8800, cur 1566721516 expire 1566721366 last 1566721289 Aug 25 01:25:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 25 01:25:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd1da6eb-2fea-e00e-466d-9521aaee5f09 (at 10.8.28.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f246a9bec00, cur 1566721525 expire 1566721375 last 1566721298 Aug 25 01:25:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 25 01:26:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 290d6608-03d4-0bb1-48e8-288d4a314d54 (at 10.8.28.9@o2ib6) Aug 25 01:26:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 08:10:57 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745850/real 1566745850] req@ffff8f0ba5dd6900 x1636779672222080/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745857 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 25 08:10:57 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745850/real 1566745850] req@ffff8f0f59b1c500 x1636779672222096/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745857 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 25 08:10:57 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 25 08:10:57 fir-md1-s1 kernel: Lustre: 23651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 25 08:11:04 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745857/real 1566745857] req@ffff8f0f59b1c500 x1636779672222096/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745864 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 08:11:04 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 25 08:11:05 fir-md1-s1 kernel: Lustre: 20541:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1270e53000 x1642462155411328/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:10/0 lens 480/568 e 1 to 0 dl 1566745870 ref 2 fl Interpret:/0/0 rc 0/0 Aug 25 08:11:05 fir-md1-s1 kernel: Lustre: 20541:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 25 08:11:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 08:11:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 08:11:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 08:11:11 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745864/real 1566745864] req@ffff8f055a301b00 x1636779672222128/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745871 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 08:11:11 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 25 08:11:25 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745878/real 1566745878] req@ffff8f0f59b1c500 x1636779672222096/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745885 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 08:11:25 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Aug 25 08:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 08:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 08:11:46 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745899/real 1566745899] req@ffff8f055a301b00 x1636779672222128/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745906 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 08:11:46 fir-md1-s1 kernel: Lustre: 10308:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 25 08:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 08:11:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 08:12:21 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566745934/real 1566745934] req@ffff8f0f59b1c500 x1636779672222096/t0(0) o106->fir-MDT0002@10.9.104.11@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566745941 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 08:12:21 fir-md1-s1 kernel: Lustre: 23687:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Aug 25 08:13:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6a8b4b1-b3d2-5e16-e9ac-dd4f273eb99c (at 10.9.104.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3d16da4400, cur 1566745983 expire 1566745833 last 1566745756 Aug 25 08:13:03 fir-md1-s1 kernel: Lustre: 10308:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (83:50s); client may timeout. req@ffff8f12830d7200 x1642462155412144/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:10/0 lens 480/536 e 1 to 0 dl 1566745933 ref 1 fl Complete:/0/0 rc 301/301 Aug 25 08:13:03 fir-md1-s1 kernel: Lustre: 10308:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 25 08:13:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cc5a9ed3-1f35-8bfd-085b-f0ca7acb4d6e (at 10.9.104.11@o2ib4) Aug 25 08:13:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a15e8f7f-c63f-bd44-a967-0ea3bbbe4e4d (at 10.9.104.7@o2ib4) Aug 25 08:13:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 08:14:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 25 08:14:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 08:14:25 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.20@o2ib6, removing former export from same NID Aug 25 08:33:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 28404991-8d4b-34f2-2c6d-5eaed62a4d2d (at 10.9.107.2@o2ib4) Aug 25 08:33:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 25 08:34:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 25 08:34:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 09:22:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 38fd5151-2b57-d061-f52d-55c8ccaa0410 (at 10.9.104.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1f9215d400, cur 1566750173 expire 1566750023 last 1566749946 Aug 25 09:22:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 25 09:23:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2de0b757-9b79-a15e-4447-cea1268e488d (at 10.9.104.15@o2ib4) Aug 25 09:23:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 09:25:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 25 09:25:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 10:30:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 65ad8c30-f31b-1ba2-fe35-56d59b3abaff (at 10.9.110.20@o2ib4) Aug 25 10:30:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 10:32:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 25 10:32:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 17:24:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a3a2777a-6dfa-2ab4-052a-910af06dc33b (at 10.8.23.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2f509f9800, cur 1566779076 expire 1566778926 last 1566778849 Aug 25 17:24:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 17:25:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d5adbdfe-b33b-e426-9aa1-aace449bc7cc (at 10.8.23.21@o2ib6) Aug 25 17:25:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:33:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c30fe4f7-8e2c-3ef4-9dba-74bcf6789c19 (at 10.9.102.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f06ff82a400, cur 1566783226 expire 1566783076 last 1566782999 Aug 25 18:33:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:34:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.20@o2ib4) Aug 25 18:34:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:36:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 25 18:36:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:56:39 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566784592/real 1566784592] req@ffff8f07dbe13300 x1636780018299376/t0(0) o106->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566784599 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 25 18:56:39 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages Aug 25 18:56:47 fir-md1-s1 kernel: Lustre: 10589:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0adc8f4800 x1642464800104832/t0(0) o101->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:22/0 lens 480/568 e 1 to 0 dl 1566784612 ref 2 fl Interpret:/0/0 rc 0/0 Aug 25 18:56:47 fir-md1-s1 kernel: Lustre: 10589:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 25 18:56:53 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566784606/real 1566784606] req@ffff8f07dbe13300 x1636780018299376/t0(0) o106->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566784613 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 18:56:53 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Aug 25 18:57:14 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566784627/real 1566784627] req@ffff8f07dbe13300 x1636780018299376/t0(0) o106->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566784634 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 18:57:14 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 25 18:57:49 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566784662/real 1566784662] req@ffff8f07dbe13300 x1636780018299376/t0(0) o106->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566784669 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 18:57:49 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Aug 25 18:58:59 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566784732/real 1566784732] req@ffff8f07dbe13300 x1636780018299376/t0(0) o106->fir-MDT0002@10.9.104.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1566784739 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 25 18:58:59 fir-md1-s1 kernel: Lustre: 10253:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 25 18:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 18:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 18:59:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:59:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 18:59:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 18:59:51 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 347468c4-d1dd-0186-b62a-6127754f1055 (at 10.9.104.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3fcf676400, cur 1566784791 expire 1566784641 last 1566784564 Aug 25 18:59:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 25 18:59:53 fir-md1-s1 kernel: LNet: Service thread pid 10253 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 25 18:59:53 fir-md1-s1 kernel: Pid: 10253, comm: mdt00_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 25 18:59:53 fir-md1-s1 kernel: Call Trace: Aug 25 18:59:53 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Aug 25 18:59:53 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Aug 25 18:59:53 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Aug 25 18:59:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 25 18:59:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 25 18:59:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 25 18:59:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 25 18:59:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 25 18:59:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566784793.10253 Aug 25 19:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 25 19:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 25 19:00:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c34bd871-feb5-2ab0-d1d1-aed708d4cdb5 (at 10.9.104.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2edaaca400, cur 1566784812 expire 1566784662 last 1566784585 Aug 25 19:00:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 25 19:00:12 fir-md1-s1 kernel: LNet: Service thread pid 10253 completed after 219.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 25 19:00:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 82fad423-e1d2-5dca-b4b1-aed3b4484dc3 (at 10.9.104.13@o2ib4) Aug 26 03:09:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 48b5ab04-28b2-4e90-516a-1c78896d0f60 (at 10.9.108.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f27cc30bc00, cur 1566814179 expire 1566814029 last 1566813952 Aug 26 03:11:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c6c59ab0-e988-23a1-cc7e-1da8c8402d65 (at 10.9.108.37@o2ib4) Aug 26 03:11:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 03:12:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 26 03:12:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 04:08:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 17da4e75-34c6-9611-64b7-dd7bf0ceb9f0 (at 10.9.106.40@o2ib4) Aug 26 04:08:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 04:09:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f937866-81ed-1fa0-6cab-7aae3323fc7a (at 10.8.11.20@o2ib6) Aug 26 04:09:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 08:31:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6bff1a2f-bee8-1787-a575-c7c473d78a94 (at 10.8.19.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f233eb25800, cur 1566833469 expire 1566833319 last 1566833242 Aug 26 08:31:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 08:41:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f37bcae1-5968-c42e-e3ff-512bc53b2aa0 (at 10.8.19.5@o2ib6) Aug 26 08:41:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 08:41:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.19.5@o2ib6, removing former export from same NID Aug 26 08:41:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f37bcae1-5968-c42e-e3ff-512bc53b2aa0 (at 10.8.19.5@o2ib6) Aug 26 09:14:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 59154633-4793-a691-e08d-7e8f4ea3ec10 (at 10.9.108.51@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1ab363c400, cur 1566836099 expire 1566835949 last 1566835872 Aug 26 09:14:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 09:16:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e5b93380-d799-cd49-04b1-82827b5a442d (at 10.9.108.51@o2ib4) Aug 26 09:16:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client de05685f-8ef3-31ac-525f-87a2b93e407e (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2ec639e000, cur 1566838978 expire 1566838828 last 1566838751 Aug 26 10:02:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:05:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2903d57c-5762-79f0-1085-17dddf3a1579 (at 10.8.23.12@o2ib6) Aug 26 10:05:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:08:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bda93780-79cb-e3be-60c9-7b6dd1898955 (at 10.8.13.16@o2ib6) Aug 26 10:08:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:08:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 47a6302d-a743-125a-02b3-313d2bbc1eb9 (at 10.8.12.10@o2ib6) Aug 26 10:08:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:08:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e6013b5a-eca8-d15d-b3e9-9031de40ac97 (at 10.8.6.2@o2ib6) Aug 26 10:08:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:09:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b743407c-1f2a-22f5-529c-bf172a166e4e (at 10.8.2.20@o2ib6) Aug 26 10:09:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Aug 26 10:09:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7dda9c44-ed80-3b50-1ef1-19c95bb443ad (at 10.8.12.24@o2ib6) Aug 26 10:09:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 26 10:09:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 56779f0d-27af-fd72-a61d-e6db037465c2 (at 10.8.13.17@o2ib6) Aug 26 10:09:40 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 26 10:10:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 241ae0ce-771e-d1f2-5299-b08a257d6e25 (at 10.8.4.8@o2ib6) Aug 26 10:10:13 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Aug 26 10:11:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8e631bdb-2479-2f63-53f8-e1ccece79ae0 (at 10.8.4.20@o2ib6) Aug 26 10:11:23 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 26 10:12:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 241ae0ce-771e-d1f2-5299-b08a257d6e25 (at 10.8.4.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f16c65fb400, cur 1566839573 expire 1566839423 last 1566839346 Aug 26 10:12:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:14:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2e638d81-730a-f327-8cc6-e2930a0b34bf (at 10.8.4.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4020f4b400, cur 1566839656 expire 1566839506 last 1566839429 Aug 26 10:14:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 10:27:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 55d9c5e7-e779-f0c6-10dd-4a121388f89a (at 10.8.2.13@o2ib6) Aug 26 10:27:28 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 26 10:28:10 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 10:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 26 10:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 26 10:28:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 10:29:17 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 10:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3afbe2a-3f2b-9c0f-54c8-37380bf10a8b (at 10.8.0.65@o2ib6) reconnecting Aug 26 10:29:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 26 11:07:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 37f1bd8f-ef8d-653c-ed0a-f101a82cabc9 (at 10.8.12.16@o2ib6) Aug 26 11:07:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 11:15:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ba13249b-46c3-0ce8-bdfd-3ad1b846cd20 (at 10.8.12.13@o2ib6) Aug 26 11:15:19 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 26 11:25:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b11fc2aa-8c8e-f7d6-4a59-9a63be991c30 (at 10.8.4.15@o2ib6) Aug 26 11:25:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 26 11:28:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b1ae69be-b4f3-5205-6e15-c66beead2ae9 (at 10.8.2.3@o2ib6) Aug 26 11:28:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 11:28:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 618e225b-703c-018e-729b-954b32a301fe (at 10.8.4.1@o2ib6) Aug 26 11:28:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 11:30:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to dee8a033-ba5c-cc76-f48b-91e3ba6dbdf9 (at 10.8.6.4@o2ib6) Aug 26 11:30:11 fir-md1-s1 kernel: Lustre: Skipped 182 previous similar messages Aug 26 11:35:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 52f122d9-18ac-3503-cd1f-975f38826699 (at 10.8.6.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33e4ebdc00, cur 1566844551 expire 1566844401 last 1566844324 Aug 26 11:36:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b11fc2aa-8c8e-f7d6-4a59-9a63be991c30 (at 10.8.4.15@o2ib6) Aug 26 11:36:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 11:47:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c96116ed-f80d-33b0-f328-cdff1d946b78 (at 10.8.3.28@o2ib6) Aug 26 11:47:04 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 26 11:49:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b11fc2aa-8c8e-f7d6-4a59-9a63be991c30 (at 10.8.4.15@o2ib6) Aug 26 11:49:23 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Aug 26 11:54:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1172c21f-93a9-05f9-c5a4-bf1f78a4e7c0 (at 10.8.11.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2dc65c0800, cur 1566845680 expire 1566845530 last 1566845453 Aug 26 11:54:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 11:55:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 18b97217-5ebc-02cb-5a61-f8708379ea46 (at 10.9.108.43@o2ib4) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8f2adbc7a400, cur 1566845756 expire 1566845606 last 1566845553 Aug 26 11:55:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 11:57:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f7ddce57-4ab3-630e-d55c-ab3d9b453279 (at 10.8.11.19@o2ib6) Aug 26 11:57:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 12:06:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cc5a9ed3-1f35-8bfd-085b-f0ca7acb4d6e (at 10.9.104.11@o2ib4) Aug 26 12:06:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 26 12:46:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f2595515-8d55-d4e7-ea74-00e6bd9e71d3 (at 10.9.112.9@o2ib4) Aug 26 12:46:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Aug 26 12:47:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f4e109a3-1e90-c790-4690-6ae8b31fae28 (at 10.9.114.13@o2ib4) Aug 26 12:47:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 12:52:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a97cc4f6-6e92-7669-8f75-f73e64eb3df2 (at 10.9.108.35@o2ib4) Aug 26 12:52:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 12:54:13 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:54:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 86b67b75-6c0f-ae81-8e06-d52463bc403e (at 10.8.10.18@o2ib6) reconnecting Aug 26 12:54:30 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:54:30 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 26 12:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b8f1a4cf-96ba-3d55-7e36-def4a84df5b8 (at 10.8.10.35@o2ib6) reconnecting Aug 26 12:54:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 12:54:38 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:54:38 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 26 12:54:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b9e6574-1aa9-db8c-d791-d6abb94c9366 (at 10.8.11.31@o2ib6) reconnecting Aug 26 12:54:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 12:54:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:54:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 125 previous similar messages Aug 26 12:54:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f823709-144c-78a7-7154-a04e05352243 (at 10.8.11.19@o2ib6) reconnecting Aug 26 12:54:52 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Aug 26 12:55:35 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ce6abe8e-7cfd-421c-6611-9b2b875f8720 (at 10.8.10.1@o2ib6) reconnecting Aug 26 12:55:45 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:55:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a7a4f5ec-5701-92f8-07e6-1f17cc662929 (at 10.8.11.29@o2ib6) reconnecting Aug 26 12:56:16 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:56:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 86b67b75-6c0f-ae81-8e06-d52463bc403e (at 10.8.10.18@o2ib6) reconnecting Aug 26 12:56:50 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 12:56:50 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 26 12:56:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 649dfdf1-1fb1-7425-025a-f7f66812863f (at 10.8.10.29@o2ib6) reconnecting Aug 26 12:56:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 12:56:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7c3f56db-f273-5d44-6d2d-7a51f76d6b18 (at 10.8.10.25@o2ib6) Aug 26 12:56:57 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Aug 26 13:26:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 98b862eb-2092-2bce-5946-abd2c64dd438 (at 10.9.104.46@o2ib4) Aug 26 13:26:08 fir-md1-s1 kernel: Lustre: Skipped 99 previous similar messages Aug 26 14:08:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e1f0b791-b98b-5c09-14c2-a46b6c4a565f (at 10.8.23.17@o2ib6) Aug 26 14:08:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:09:03 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.23.17@o2ib6, removing former export from same NID Aug 26 14:28:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0f823709-144c-78a7-7154-a04e05352243 (at 10.8.11.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0a81598c00, cur 1566854894 expire 1566854744 last 1566854667 Aug 26 14:28:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:28:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0f823709-144c-78a7-7154-a04e05352243 (at 10.8.11.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0d4a7b1400, cur 1566854900 expire 1566854750 last 1566854673 Aug 26 14:30:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f7ddce57-4ab3-630e-d55c-ab3d9b453279 (at 10.8.11.19@o2ib6) Aug 26 14:30:37 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 26 14:30:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e41537ef-d236-cd48-36e4-dc178bcaebda (at 10.8.4.21@o2ib6) Aug 26 14:30:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f17a3ee9-ba99-f937-959b-14a89705699b (at 10.8.11.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1b70d8c400, cur 1566855818 expire 1566855668 last 1566855591 Aug 26 14:43:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 14:46:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f7ddce57-4ab3-630e-d55c-ab3d9b453279 (at 10.8.11.19@o2ib6) Aug 26 14:46:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:54:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 14:54:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 167 previous similar messages Aug 26 14:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 705ae766-7496-3e3c-7a4b-0c1f4d988567 (at 10.9.0.1@o2ib4) reconnecting Aug 26 14:54:31 fir-md1-s1 kernel: Lustre: Skipped 89 previous similar messages Aug 26 14:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 26 14:54:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 24f12160-a170-2f71-903f-2464394d70a2 (at 10.9.107.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3e02f11000, cur 1566856760 expire 1566856610 last 1566856533 Aug 26 14:59:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 14:59:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 24f12160-a170-2f71-903f-2464394d70a2 (at 10.9.107.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f11e0303000, cur 1566856764 expire 1566856614 last 1566856537 Aug 26 14:59:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 15:00:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f0edb7ca-7e89-d24c-a709-0dd0dab47a59 (at 10.9.107.26@o2ib4) Aug 26 15:07:46 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 15:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ef0748a0-58bc-3624-ed96-74860cd1e591 (at 10.8.0.66@o2ib6) reconnecting Aug 26 15:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 810ae33a-f2a4-73ad-b573-a8509a545499 (at 10.8.0.66@o2ib6) Aug 26 15:07:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:10:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1890d675-ce1f-cd8f-dea3-5b5821d43c68 (at 10.8.0.67@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a8c088c00, cur 1566857405 expire 1566857255 last 1566857178 Aug 26 15:15:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e0c94f31-4fd8-0024-8bee-d62de96f3c21 (at 10.8.20.35@o2ib6) Aug 26 15:16:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 892f96ed-71c6-6f5c-9ded-e8ca9c7a55fd (at 10.8.21.13@o2ib6) Aug 26 15:16:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:19:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7fdf582d-4d14-07f2-955b-e05e764c6c0c (at 10.9.103.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e2154f400, cur 1566857982 expire 1566857832 last 1566857755 Aug 26 15:19:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:19:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 05bc7b4b-330f-46b3-487b-be098b37bda6 (at 10.9.103.22@o2ib4) Aug 26 15:19:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:20:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 141bc29e-b4f0-a66d-6d7e-591b430fb859 (at 10.8.21.22@o2ib6) Aug 26 15:20:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:28:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae69479a-cc7c-26e3-89b7-5ea32bf57e5d (at 10.9.110.38@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3fb2977400, cur 1566858498 expire 1566858348 last 1566858271 Aug 26 15:28:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:29:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2b1a8aee-d7ab-6aae-7e81-bb248b0bf25b (at 10.9.110.38@o2ib4) Aug 26 15:29:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:34:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5c3492d5-1f92-6f13-19c2-fcdc4ec87fac (at 10.9.103.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f34f9a00c00, cur 1566858853 expire 1566858703 last 1566858626 Aug 26 15:34:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 15:34:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c6907ad9-a959-3d3e-2e5c-fbd8331fe965 (at 10.9.103.23@o2ib4) Aug 26 15:34:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:06:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 26 16:06:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 21ef7a55-c61d-b439-5777-0a637bbab61a (at 10.9.103.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3069ec4800, cur 1566861130 expire 1566860980 last 1566860903 Aug 26 16:12:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:13:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 08a26691-6690-c3ec-c92a-060f765daa32 (at 10.9.103.42@o2ib4) Aug 26 16:13:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:21:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7b2c1ffa-ac37-c88d-7b27-c0d67e17f03c (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f453160d400, cur 1566861677 expire 1566861527 last 1566861450 Aug 26 16:21:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:21:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 40ebe744-82bc-a30e-9343-50eaabccaf84 (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f45172b8800, cur 1566861681 expire 1566861531 last 1566861454 Aug 26 16:21:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 16:23:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 18a19679-982f-f19a-fe36-aedce5eb4405 (at 10.9.108.65@o2ib4) Aug 26 16:23:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:48:27 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 16:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client be3d1aa7-324e-9e86-038a-3377f9e850ee (at 10.8.4.28@o2ib6) reconnecting Aug 26 16:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3d24d033-28ba-5879-05a1-d6ac927e2c7c (at 10.8.4.28@o2ib6) Aug 26 16:48:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 16:48:45 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f43dea69500 x1633944200788688/t360690913798(0) o36->923b3b39-c45c-b3ff-a6cb-68a2326b052e@10.9.101.52@o2ib4:20/0 lens 488/3152 e 1 to 0 dl 1566863330 ref 2 fl Interpret:/0/0 rc 0/0 Aug 26 16:51:13 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 16:51:13 fir-md1-s1 kernel: LNetError: 20185:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 26 16:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 705ae766-7496-3e3c-7a4b-0c1f4d988567 (at 10.9.0.1@o2ib4) reconnecting Aug 26 16:51:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 16:51:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 26 16:51:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 17:03:50 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:03:51 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:03:51 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Aug 26 17:03:52 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:03:52 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 17 previous similar messages Aug 26 17:03:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4f0dedb2-4da8-c70c-e108-3ad690202f4d (at 10.8.12.11@o2ib6) reconnecting Aug 26 17:03:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 68002702-3a8e-1651-a296-80559f2e0c33 (at 10.8.12.11@o2ib6) Aug 26 17:03:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 05efda7a-a657-27ce-19b0-d07b4135103b (at 10.8.12.26@o2ib6) reconnecting Aug 26 17:03:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 26 17:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ce6abe8e-7cfd-421c-6611-9b2b875f8720 (at 10.8.10.1@o2ib6) reconnecting Aug 26 17:03:59 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Aug 26 17:04:01 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:04:01 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 29 previous similar messages Aug 26 17:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bc91bf51-d795-b3ca-630a-f6567cc4adb4 (at 10.8.2.12@o2ib6) reconnecting Aug 26 17:04:08 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 26 17:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to db9649b3-6cc3-cccf-ca1e-3c3a4f9ec89c (at 10.8.2.12@o2ib6) Aug 26 17:04:08 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Aug 26 17:04:09 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:04:09 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 24 previous similar messages Aug 26 17:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 039b534c-8a5a-ec1c-42ac-214c428c2518 (at 10.8.11.3@o2ib6) reconnecting Aug 26 17:04:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 26 17:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 60db27b9-2971-c9b5-30e0-0b9cb8d03be0 (at 10.8.11.3@o2ib6) Aug 26 17:04:16 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Aug 26 17:04:23 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:04:23 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 127 previous similar messages Aug 26 17:04:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c5fb118d-3a3d-5f46-1991-7d81724a70dc (at 10.8.6.1@o2ib6) reconnecting Aug 26 17:04:30 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages Aug 26 17:04:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to df5fa5ed-21a8-0513-dd32-6b4b409076bf (at 10.8.11.17@o2ib6) Aug 26 17:04:38 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages Aug 26 17:09:00 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 26 17:09:00 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 26 17:09:24 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:09:24 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 15 previous similar messages Aug 26 17:09:27 fir-md1-s1 kernel: Lustre: 23661:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 26 17:09:27 fir-md1-s1 kernel: Lustre: 23661:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages Aug 26 17:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 705ae766-7496-3e3c-7a4b-0c1f4d988567 (at 10.9.0.1@o2ib4) reconnecting Aug 26 17:09:31 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 26 17:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 83868a67-b645-b6f1-0ec2-04638d68d77a (at 10.9.0.1@o2ib4) Aug 26 17:09:31 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Aug 26 17:13:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 26 17:18:40 fir-md1-s1 kernel: LNetError: 20196:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3ad7febb-b12e-83e9-5d00-643d11a63aab (at 10.9.103.20@o2ib4) reconnecting Aug 26 17:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aa2745c5-97d2-57ac-6c17-b0bea41f7eec (at 10.9.103.20@o2ib4) Aug 26 17:18:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 17:20:46 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6282e924-823c-ee43-6de9-1b6a734cef6f (at 10.8.0.67@o2ib6) reconnecting Aug 26 17:23:34 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 26 17:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6282e924-823c-ee43-6de9-1b6a734cef6f (at 10.8.0.67@o2ib6) reconnecting Aug 26 17:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 26 17:23:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 26 17:28:33 fir-md1-s1 kernel: Lustre: 23687:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 17:28:33 fir-md1-s1 kernel: Lustre: 23687:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 110 previous similar messages Aug 26 17:32:18 fir-md1-s1 kernel: Lustre: 23661:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 17:32:18 fir-md1-s1 kernel: Lustre: 23661:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Aug 26 17:48:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6068169c-7e43-cb48-702e-c86ea109eb0e (at 10.8.4.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f333d429400, cur 1566866915 expire 1566866765 last 1566866688 Aug 26 17:48:35 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 26 17:48:58 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 17:48:58 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages Aug 26 18:01:13 fir-md1-s1 kernel: Lustre: 23557:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 18:01:13 fir-md1-s1 kernel: Lustre: 23557:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Aug 26 18:06:44 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 18:06:44 fir-md1-s1 kernel: Lustre: 21416:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 26 18:12:45 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 18:12:45 fir-md1-s1 kernel: Lustre: 10501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Aug 26 18:16:41 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 18:16:41 fir-md1-s1 kernel: Lustre: 10304:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Aug 26 18:34:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2eceea9b-696b-ba64-0ec1-6ae08dc268bd (at 10.8.6.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f33d2fb1800, cur 1566869649 expire 1566869499 last 1566869422 Aug 26 18:34:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 18:55:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 73650127-be6f-11d7-6372-bc03c4ceab4c (at 10.8.6.5@o2ib6) Aug 26 18:55:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 26 19:49:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 70dda6ff-d22c-cb19-723a-3fdbd1b01db4 (at 10.9.110.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3fb2970800, cur 1566874195 expire 1566874045 last 1566873968 Aug 26 19:49:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 19:50:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 89bf8d6a-6cf4-4736-f754-0e614d3d51d0 (at 10.9.110.8@o2ib4) Aug 26 19:50:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 20:04:26 fir-md1-s1 kernel: Lustre: 23618:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 26 20:04:26 fir-md1-s1 kernel: Lustre: 23618:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Aug 26 20:50:01 fir-md1-s1 kernel: Lustre: 23573:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 26 22:07:15 fir-md1-s1 kernel: Lustre: 27320:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Aug 26 22:46:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 981955f8-f668-b378-4a24-450b17189ee9 (at 10.8.4.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1af26e5800, cur 1566884807 expire 1566884657 last 1566884580 Aug 26 22:46:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 23:26:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b226c793-3f36-034a-b668-7f1ccd46189a (at 10.8.4.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1a97bbb400, cur 1566887186 expire 1566887036 last 1566886959 Aug 26 23:26:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 23:29:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 59cd89e5-1d76-bc8e-3b8b-e39b6abc2de1 (at 10.8.6.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1982f95400, cur 1566887380 expire 1566887230 last 1566887153 Aug 26 23:29:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 23:30:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8854773-9aa6-f1e7-6350-50dcf071c80b (at 10.8.6.31@o2ib6) in 221 seconds. I think it's dead, and I am evicting it. exp ffff8f34dbac0800, cur 1566887456 expire 1566887306 last 1566887235 Aug 26 23:30:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 26 23:31:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client bf653391-086e-29b1-8e49-e032da7e4ce1 (at 10.8.6.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3c25e14400, cur 1566887462 expire 1566887312 last 1566887235 Aug 27 00:22:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fbc0dca6-84da-d8d3-2ab3-fe362dd60410 (at 10.9.106.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4275963000, cur 1566890540 expire 1566890390 last 1566890313 Aug 27 00:22:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 01:36:32 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 01:36:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 357ed5e6-797d-063b-772c-730368f05495 (at 10.9.103.26@o2ib4) reconnecting Aug 27 01:36:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ba0afaf6-d34f-7ddc-1fad-47d42cf13f2f (at 10.9.103.26@o2ib4) Aug 27 01:36:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 02:56:24 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 27 04:23:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3519619d-e5e2-daa2-4817-b520e8ef39da (at 10.9.110.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f064a74d800, cur 1566905017 expire 1566904867 last 1566904790 Aug 27 04:23:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 04:24:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 762837ee-4f7e-c137-7d45-61b91130a40c (at 10.9.110.13@o2ib4) Aug 27 06:08:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b1ae69be-b4f3-5205-6e15-c66beead2ae9 (at 10.8.2.3@o2ib6) Aug 27 06:08:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 06:25:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 06:25:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 06:25:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6016069c-5084-534f-69c6-fe647b2dccae (at 10.9.102.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f19f3d3a400, cur 1566912315 expire 1566912165 last 1566912088 Aug 27 06:25:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 09:49:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 09:49:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 09:50:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5890eb4b-33c1-2ed3-4d2b-60df28cbaad8 (at 10.8.8.25@o2ib6) reconnecting Aug 27 09:50:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.8.25@o2ib6) Aug 27 09:50:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:09:18 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 27 10:09:18 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 27 10:23:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.20@o2ib4) Aug 27 10:29:00 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:29:00 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 10:29:01 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 59ce9f0a-1531-034c-ec25-e5e6599c459f (at 10.8.3.11@o2ib6) reconnecting Aug 27 10:29:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2fe0f0d3-9135-aba7-5c24-bc36cb198e87 (at 10.8.6.9@o2ib6) Aug 27 10:29:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:29:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:29:07 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:29:07 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 10:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7a70cc17-4e83-fcc0-1753-2099f41ffee4 (at 10.8.2.23@o2ib6) reconnecting Aug 27 10:29:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 59ce9f0a-1531-034c-ec25-e5e6599c459f (at 10.8.3.11@o2ib6) reconnecting Aug 27 10:29:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7fa9f296-7620-d7bb-323f-18f49309df01 (at 10.8.3.11@o2ib6) Aug 27 10:29:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:29:25 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:29:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7fae7056-7652-02b2-3c4d-31fb25e8b1c1 (at 10.8.2.21@o2ib6) reconnecting Aug 27 10:29:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 05a868b9-dd47-4e82-eb45-e3429e5b7cf6 (at 10.8.2.21@o2ib6) Aug 27 10:47:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f34eff35-9f31-0888-c4bb-e6f93e879de4 (at 10.9.108.17@o2ib4) Aug 27 10:47:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 10:47:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 77c86a88-2e06-0520-8e5c-64367624bd0c (at 10.9.108.19@o2ib4) Aug 27 10:47:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:55:29 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566928521/real 1566928522] req@ffff8f398b77c200 x1636782179983632/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1566928529 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 27 10:55:29 fir-md1-s1 kernel: Lustre: 23614:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Aug 27 10:55:34 fir-md1-s1 kernel: LNet: 20185:0:(o2iblnd_cb.c:408:kiblnd_handle_rx()) PUT_NACK from 10.0.10.3@o2ib7 Aug 27 10:55:36 fir-md1-s1 kernel: Lustre: 50576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f4375661500 x1631583641445696/t0(0) o36->8c44a420-9990-75c1-2b64-64b6fe5d1b1b@10.9.102.27@o2ib4:11/0 lens 504/2888 e 1 to 0 dl 1566928541 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 10:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8c44a420-9990-75c1-2b64-64b6fe5d1b1b (at 10.9.102.27@o2ib4) reconnecting Aug 27 10:55:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 10:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to af852138-14c6-678b-c55f-677c64fab09c (at 10.9.102.27@o2ib4) Aug 27 10:55:42 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 27 10:55:47 fir-md1-s1 kernel: Lustre: 23683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f0aa8d87b00 x1631571815876768/t0(0) o36->4b6e4105-ad27-7331-49d4-b54bb82f1685@10.9.105.21@o2ib4:22/0 lens 512/2888 e 0 to 0 dl 1566928552 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 10:55:47 fir-md1-s1 kernel: Lustre: 23683:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Aug 27 10:55:49 fir-md1-s1 kernel: Lustre: 23660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1116671e00 x1634135448774496/t0(0) o101->eef3e7bf-8b9f-8c5b-c710-00e4798713e4@10.9.104.71@o2ib4:24/0 lens 480/568 e 0 to 0 dl 1566928554 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 10:55:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4b6e4105-ad27-7331-49d4-b54bb82f1685 (at 10.9.105.21@o2ib4) reconnecting Aug 27 10:55:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:55:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2ec46734-7ea0-647e-5e75-412c723b42cb (at 10.9.105.21@o2ib4) Aug 27 10:55:54 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566928546/real 1566928546] req@ffff8f16b8b59500 x1636782179983776/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1566928554 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Aug 27 10:55:54 fir-md1-s1 kernel: Lustre: 23758:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Aug 27 10:55:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client eef3e7bf-8b9f-8c5b-c710-00e4798713e4 (at 10.9.104.71@o2ib4) reconnecting Aug 27 10:56:03 fir-md1-s1 kernel: LustreError: 21670:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff8f194c039500 x1636782179992784 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f0f43380d80/0x5d9ee6e655a45c71 lrc: 4/0,0 mode: PR/PR res: [0x200029909:0x3db:0x0].0x0 bits 0x5b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x283582c681bfa099 expref: 9465 pid: 23580 timeout: 6043636 lvb_type: 0 Aug 27 10:56:03 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 Aug 27 10:56:03 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 39s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0000_UUID lock: ffff8f0f43380d80/0x5d9ee6e655a45c71 lrc: 3/0,0 mode: PR/PR res: [0x200029909:0x3db:0x0].0x0 bits 0x5b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x283582c681bfa099 expref: 7078 pid: 23580 timeout: 0 lvb_type: 0 Aug 27 10:56:03 fir-md1-s1 kernel: LustreError: 30995:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.0.10.3@o2ib7 arrived at 1566928563 with bad export cookie 6746083106902250542 Aug 27 10:56:03 fir-md1-s1 kernel: LustreError: 30995:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 227 previous similar messages Aug 27 10:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 27 10:56:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 27 10:56:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) reconnecting Aug 27 10:56:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 27 10:56:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:56:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:57:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:57:49 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Aug 27 10:57:53 fir-md1-s1 kernel: Lustre: 30997:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1145786300 x1642476556399888/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/192 e 1 to 0 dl 1566928678 ref 2 fl Complete:/2/0 rc 116/116 Aug 27 10:57:55 fir-md1-s1 kernel: Lustre: 23624:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f1f43360c00 x1642476553764512/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:55 fir-md1-s1 kernel: Lustre: 23624:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Aug 27 10:57:56 fir-md1-s1 kernel: Lustre: 27443:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f16588b7200 x1642476551684416/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:25/0 lens 336/0 e 0 to 0 dl 1566928675 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:56 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f179228aa00 x1642476551664784/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:56 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 392 previous similar messages Aug 27 10:57:56 fir-md1-s1 kernel: Lustre: 27443:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Aug 27 10:57:57 fir-md1-s1 kernel: Lustre: 20366:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f1e6dfe9e00 x1642476566550176/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 27 10:57:57 fir-md1-s1 kernel: Lustre: 20366:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 46 previous similar messages Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b3018f10-df89-8839-02ca-152df77c3a17 (at 10.9.104.65@o2ib4) reconnecting Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0dd9897850 x1642476553866624/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:26/0 lens 336/0 e 0 to 0 dl 1566928676 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to eb7be6b4-e3f8-1635-1d34-c0a92d3ade46 (at 10.9.104.65@o2ib4) Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 116 previous similar messages Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=76 reqQ=11154 recA=0, svcEst=20, delay=941 Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31033:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f06504e4800 x1642476559141600/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:3/0 lens 336/0 e 1 to 0 dl 1566928683 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31033:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32090 previous similar messages Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=369 reqQ=11008 recA=1, svcEst=20, delay=613 Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 3 previous similar messages Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f233e5f7b00 x1642476558040320/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:26/0 lens 336/0 e 0 to 0 dl 1566928676 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:58 fir-md1-s1 kernel: Lustre: 31055:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 741 previous similar messages Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 9s req@ffff8f22a6a36300 x1642476551623536/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1142 previous similar messages Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 31033:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0f71d5bf00 x1642476567626976/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/192 e 1 to 0 dl 1566928678 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 25550:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1247 reqQ=11291 recA=0, svcEst=20, delay=54 Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 25550:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 2 previous similar messages Aug 27 10:57:59 fir-md1-s1 kernel: LustreError: 25550:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:1s ago req@ffff8f0815f39200 x1642476556397872/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 10:57:59 fir-md1-s1 kernel: LustreError: 25550:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 17 previous similar messages Aug 27 10:57:59 fir-md1-s1 kernel: Lustre: 25550:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff8f0815f39200 x1642476556397872/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 10:58:01 fir-md1-s1 kernel: Lustre: 20369:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f15a7f78f00 x1642311794960400/t0(0) o103->17fa2f85-b498-6aea-0e9b-b4cd8046edb1@10.9.115.10@o2ib4:27/0 lens 328/0 e 0 to 0 dl 1566928677 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:01 fir-md1-s1 kernel: Lustre: 20369:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 887 previous similar messages Aug 27 10:58:01 fir-md1-s1 kernel: LustreError: 30998:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:3s ago req@ffff8f0815f39500 x1642476567111200/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:01 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:3s); client may timeout. req@ffff8f0815f39500 x1642476567111200/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:02 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:58:02 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 27 10:58:02 fir-md1-s1 kernel: Lustre: 42521:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1343 reqQ=12904 recA=0, svcEst=20, delay=804 Aug 27 10:58:02 fir-md1-s1 kernel: Lustre: 42521:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 7 previous similar messages Aug 27 10:58:02 fir-md1-s1 kernel: LustreError: 21300:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:4s ago req@ffff8f0815f39800 x1642476567346352/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:02 fir-md1-s1 kernel: LustreError: 21300:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 27 10:58:03 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:58:03 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 12s req@ffff8f11c66bd400 x1642476552966048/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 10:58:03 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 13s req@ffff8f06df083f00 x1642476566310480/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:03 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1896 previous similar messages Aug 27 10:58:03 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 8 previous similar messages Aug 27 10:58:04 fir-md1-s1 kernel: Lustre: 21494:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:6s); client may timeout. req@ffff8f0815f39e00 x1642476554755584/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 10:58:04 fir-md1-s1 kernel: Lustre: 21494:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Aug 27 10:58:05 fir-md1-s1 kernel: Lustre: 24177:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-8s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f275125d400 x1642476566673936/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:27/0 lens 336/0 e 0 to 0 dl 1566928677 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:05 fir-md1-s1 kernel: Lustre: 24177:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 2789 previous similar messages Aug 27 10:58:05 fir-md1-s1 kernel: LustreError: 24177:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:7s ago req@ffff8f0815f3c200 x1642476566773024/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:05 fir-md1-s1 kernel: LustreError: 24177:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2330a7a450 x1642476566480592/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:11/0 lens 336/0 e 1 to 0 dl 1566928691 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32415 previous similar messages Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: Skipped 51 previous similar messages Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: 25078:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=300 reqQ=15492 recA=0, svcEst=20, delay=165 Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: 25078:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 50 previous similar messages Aug 27 10:58:06 fir-md1-s1 kernel: LustreError: 24564:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f3259a8ac50 x1642960511464576/t0(0) o4->01c5290e-2f99-d714-0fa9-403481192ee7@10.9.103.1@o2ib4:6/0 lens 488/448 e 0 to 0 dl 1566928716 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 10:58:06 fir-md1-s1 kernel: LustreError: 24564:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 01c5290e-2f99-d714-0fa9-403481192ee7 (at 10.9.103.1@o2ib4), client will retry: rc = -110 Aug 27 10:58:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 10:58:08 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:4s); client may timeout. req@ffff8f2267f4c800 x1642476552292976/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:4/0 lens 336/0 e 1 to 0 dl 1566928684 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 10:58:08 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 54 previous similar messages Aug 27 10:58:09 fir-md1-s1 kernel: LustreError: 21486:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:5s ago req@ffff8f2267f4fb00 x1642476554580448/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:4/0 lens 336/0 e 1 to 0 dl 1566928684 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 10:58:09 fir-md1-s1 kernel: LustreError: 21486:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 80 previous similar messages Aug 27 10:58:12 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 10:58:12 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 10:58:13 fir-md1-s1 kernel: Lustre: 21300:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-12s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0cabb58300 x1642476557855040/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:1/0 lens 336/0 e 0 to 0 dl 1566928681 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:58:13 fir-md1-s1 kernel: Lustre: 21300:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 11543 previous similar messages Aug 27 10:58:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 42927d96-a0db-4c87-102c-7dc7fa7db0e1 (at 10.8.11.27@o2ib6) reconnecting Aug 27 10:58:13 fir-md1-s1 kernel: Lustre: Skipped 137 previous similar messages Aug 27 10:58:14 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:58:14 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Aug 27 10:58:14 fir-md1-s1 kernel: Lustre: 25078:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=541 reqQ=23904 recA=0, svcEst=20, delay=503 Aug 27 10:58:14 fir-md1-s1 kernel: Lustre: 25078:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 102 previous similar messages Aug 27 10:58:16 fir-md1-s1 kernel: Lustre: 27442:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:18s); client may timeout. req@ffff8f0815f3f800 x1642476567321408/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:16 fir-md1-s1 kernel: Lustre: 27442:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 113 previous similar messages Aug 27 10:58:17 fir-md1-s1 kernel: LustreError: 30997:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:19s ago req@ffff8f0ceb7ce000 x1642476566919440/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:17 fir-md1-s1 kernel: LustreError: 30997:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 78 previous similar messages Aug 27 10:58:19 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 17s req@ffff8f24768cf850 x1642476566097200/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:19 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 19669 previous similar messages Aug 27 10:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 10:58:19 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.55@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e2421cc80/0x5d9ee6e65748fb25 lrc: 3/0,0 mode: EX/EX res: [0x2c002cce0:0x23b3:0x0].0x0 bits 0x8/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.9.101.55@o2ib4 remote: 0x29051da4b232a282 expref: 13352 pid: 97667 timeout: 6043759 lvb_type: 3 Aug 27 10:58:22 fir-md1-s1 kernel: Lustre: 31007:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2699815850 x1642476566006560/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:27/0 lens 336/0 e 1 to 0 dl 1566928707 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:22 fir-md1-s1 kernel: Lustre: 31007:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15857 previous similar messages Aug 27 10:58:23 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.104.66@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1358e76e40/0x5d9ee6e6574a5cf2 lrc: 3/0,0 mode: CR/CR res: [0x2c002cc7c:0x1c554:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.104.66@o2ib4 remote: 0xa979a7400393f21b expref: 2166 pid: 23558 timeout: 6043762 lvb_type: 0 Aug 27 10:58:23 fir-md1-s1 kernel: LustreError: 36726:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1181ba5400 x1636782180404384/t0(0) o105->fir-MDT0002@10.9.101.55@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 10:58:24 fir-md1-s1 kernel: LustreError: 20728:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1488f67c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f1e50d75a00/0x5d9ee6e6574a6320 lrc: 1/0,0 mode: EX/EX res: [0x2c002cc7c:0x1c554:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.9.104.66@o2ib4 remote: 0xa979a7400393f237 expref: 1534 pid: 20728 timeout: 0 lvb_type: 3 Aug 27 10:58:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 27 10:58:29 fir-md1-s1 kernel: Lustre: Skipped 310 previous similar messages Aug 27 10:58:29 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-19s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f0754352a00 x1642476565746944/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:10/0 lens 336/0 e 0 to 0 dl 1566928690 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:29 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 14652 previous similar messages Aug 27 10:58:30 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:58:30 fir-md1-s1 kernel: Lustre: Skipped 166 previous similar messages Aug 27 10:58:30 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=114 reqQ=34661 recA=0, svcEst=20, delay=323 Aug 27 10:58:30 fir-md1-s1 kernel: Lustre: 25085:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 166 previous similar messages Aug 27 10:58:30 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.35@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f20bbb17080/0x5d9ee6e6574ac4d3 lrc: 3/0,0 mode: CR/CR res: [0x2c002cd29:0x1c2c3:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.35@o2ib6 remote: 0x547e4419d68ad301 expref: 176557 pid: 97646 timeout: 6043770 lvb_type: 0 Aug 27 10:58:30 fir-md1-s1 kernel: LustreError: 117778:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f16b8b58f00 x1636782180410896/t0(0) o105->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 10:58:31 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4115283400 ns: mdt-fir-MDT0002_UUID lock: ffff8f1b63689680/0x5d9ee6e6574d566a lrc: 3/0,0 mode: PW/PW res: [0x2c002cca8:0x1874:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.8.27.35@o2ib6 remote: 0x547e4419d68ad507 expref: 176548 pid: 97661 timeout: 0 lvb_type: 0 Aug 27 10:58:31 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Aug 27 10:58:32 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:34s); client may timeout. req@ffff8f0ca12dc500 x1642476567520128/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:32 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4655 previous similar messages Aug 27 10:58:33 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566928706/real 1566928706] req@ffff8f0e9fe18300 x1636782180405728/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 0 to 1 dl 1566928713 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 27 10:58:33 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:58:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 10:58:33 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 40 previous similar messages Aug 27 10:58:33 fir-md1-s1 kernel: LustreError: 21305:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:21s ago req@ffff8f1a9ae0bf00 x1642476566085312/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:12/0 lens 336/0 e 1 to 0 dl 1566928692 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:58:33 fir-md1-s1 kernel: LustreError: 21305:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 2728 previous similar messages Aug 27 10:58:34 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3e04e78480/0x5d9ee6e6574c3652 lrc: 3/0,0 mode: PR/PR res: [0x2c002cc63:0x7c1e:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0x2d1dd34917eb7505 expref: 487041 pid: 23587 timeout: 6043774 lvb_type: 0 Aug 27 10:58:36 fir-md1-s1 kernel: LustreError: 31465:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.55@o2ib4 arrived at 1566928700 with bad export cookie 6746082392665407964 Aug 27 10:58:36 fir-md1-s1 kernel: LustreError: 31465:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1401 previous similar messages Aug 27 10:58:37 fir-md1-s1 kernel: LustreError: 21446:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2b472d5400 ns: mdt-fir-MDT0002_UUID lock: ffff8f201b09e780/0x5d9ee6e6574c6894 lrc: 3/0,0 mode: PW/PW res: [0x2c002cd27:0x4719:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x50200000000000 nid: 10.9.106.13@o2ib4 remote: 0x91b6dc0f688f335a expref: 811 pid: 21446 timeout: 0 lvb_type: 0 Aug 27 10:58:37 fir-md1-s1 kernel: LustreError: 21446:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Aug 27 10:58:40 fir-md1-s1 kernel: LustreError: 50448:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f1e0d32d800 ns: mdt-fir-MDT0002_UUID lock: ffff8f207d430fc0/0x5d9ee6e6574f1b4f lrc: 3/0,0 mode: PW/PW res: [0x2c002ccac:0x16c5a:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.8.17.12@o2ib6 remote: 0xb9a0d20ccf7d342a expref: 3854 pid: 50448 timeout: 0 lvb_type: 0 Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:58:40 fir-md1-s1 kernel: LustreError: 21389:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1729ecb850 x1641909908031648/t0(0) o4->f3b5bffb-e293-ee70-1624-35dbc39ab3e5@10.8.1.34@o2ib6:16/0 lens 488/448 e 0 to 0 dl 1566928726 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 10:58:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f3b5bffb-e293-ee70-1624-35dbc39ab3e5 (at 10.8.1.34@o2ib6), client will retry: rc = -110 Aug 27 10:58:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.104.57@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f36d3ef6300/0x5d9ee6e656e43244 lrc: 3/0,0 mode: CR/CR res: [0x2c002cc9c:0x67ef:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x60000400000020 nid: 10.9.104.57@o2ib4 remote: 0x686f4d54c1a25671 expref: 2786 pid: 23677 timeout: 6043783 lvb_type: 3 Aug 27 10:58:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Aug 27 10:58:43 fir-md1-s1 kernel: LustreError: 117780:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f13c9024e00 x1636782180422512/t0(0) o105->fir-MDT0002@10.9.104.57@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 10:58:43 fir-md1-s1 kernel: LustreError: 117780:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Aug 27 10:58:43 fir-md1-s1 kernel: LustreError: 97668:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f34fd378800 ns: mdt-fir-MDT0002_UUID lock: ffff8f1cf3138900/0x5d9ee6e657501ef7 lrc: 3/0,0 mode: PW/PW res: [0x2c002cc9c:0x67f2:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.9.104.57@o2ib4 remote: 0x686f4d54c1a25b95 expref: 2781 pid: 97668 timeout: 0 lvb_type: 0 Aug 27 10:58:44 fir-md1-s1 kernel: LustreError: 97668:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Aug 27 10:58:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 49375ad1-0490-18e3-7c5c-c4e2e3d502ef (at 10.8.18.20@o2ib6) reconnecting Aug 27 10:58:46 fir-md1-s1 kernel: Lustre: Skipped 329 previous similar messages Aug 27 10:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:58:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:58:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 10:58:51 fir-md1-s1 kernel: Lustre: 20367:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 29s req@ffff8f1c805b5450 x1642613901889888/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:51 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 29s req@ffff8f1fa3baa700 x1642613901884560/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Aug 27 10:58:51 fir-md1-s1 kernel: Lustre: 23101:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 59011 previous similar messages Aug 27 10:58:51 fir-md1-s1 kernel: Lustre: 20367:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 916 previous similar messages Aug 27 10:58:52 fir-md1-s1 kernel: LustreError: 21181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f252344f400 ns: mdt-fir-MDT0000_UUID lock: ffff8f2d9404a1c0/0x5d9ee6e6574e5bbd lrc: 1/0,0 mode: EX/EX res: [0x200029cc9:0x2d6:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.8.19.7@o2ib6 remote: 0xf99a11858d3b983d expref: 5 pid: 21181 timeout: 0 lvb_type: 3 Aug 27 10:58:52 fir-md1-s1 kernel: LustreError: 21181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Aug 27 10:58:54 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:58:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 10:58:55 fir-md1-s1 kernel: Lustre: 31014:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f25d2f7ef00 x1641909908028272/t0(0) o103->f3b5bffb-e293-ee70-1624-35dbc39ab3e5@10.8.1.34@o2ib6:0/0 lens 328/0 e 1 to 0 dl 1566928740 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:58:55 fir-md1-s1 kernel: Lustre: 31014:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 37930 previous similar messages Aug 27 10:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 10:59:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 27 10:59:01 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-13s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f10ba8ccb00 x1642476563329664/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:18/0 lens 336/0 e 0 to 0 dl 1566928728 ref 2 fl New:/0/ffffffff rc 0/-1 Aug 27 10:59:01 fir-md1-s1 kernel: Lustre: 38786:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 94353 previous similar messages Aug 27 10:59:03 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 10:59:03 fir-md1-s1 kernel: Lustre: 22893:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=62 reqQ=7205 recA=0, svcEst=20, delay=821 Aug 27 10:59:03 fir-md1-s1 kernel: Lustre: 22893:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1207 previous similar messages Aug 27 10:59:03 fir-md1-s1 kernel: Lustre: Skipped 1219 previous similar messages Aug 27 10:59:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.3@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 10:59:04 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 27 10:59:04 fir-md1-s1 kernel: Lustre: 30994:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:66s); client may timeout. req@ffff8f0fdbd94800 x1642476566986448/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566928678 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 10:59:04 fir-md1-s1 kernel: Lustre: 30994:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5592 previous similar messages Aug 27 10:59:05 fir-md1-s1 kernel: LustreError: 20369:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.10.16@o2ib6: deadline 6:20s ago req@ffff8f17b58d0300 x1642682250272176/t0(0) o103->881cae52-076f-8fb7-508c-ade7bb964ea2@10.8.10.16@o2ib6:15/0 lens 328/0 e 0 to 0 dl 1566928725 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Aug 27 10:59:05 fir-md1-s1 kernel: LustreError: 20369:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3475 previous similar messages Aug 27 10:59:08 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:59:08 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 27 10:59:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.24.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f1632603cc0/0x5d9ee6e652f955db lrc: 3/0,0 mode: PR/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 674 type: IBT flags: 0x60200400000020 nid: 10.8.24.14@o2ib6 remote: 0x75e8937986160a66 expref: 132 pid: 22280 timeout: 6043808 lvb_type: 0 Aug 27 10:59:09 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Aug 27 10:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 10:59:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 27 10:59:29 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 10:59:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Aug 27 10:59:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b1cc3eab-efb4-303a-d19d-56b06e7b1c70 (at 10.8.13.22@o2ib6) Aug 27 10:59:33 fir-md1-s1 kernel: Lustre: Skipped 1307 previous similar messages Aug 27 10:59:35 fir-md1-s1 kernel: LustreError: 23454:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928685, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2fc3e44a40/0x5d9ee6e6574c3aea lrc: 3/0,1 mode: --/PW res: [0x2c002cc63:0x7c1e:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23454 timeout: 0 lvb_type: 0 Aug 27 10:59:37 fir-md1-s1 kernel: LustreError: 20727:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928687, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f19016fb3c0/0x5d9ee6e6574c72b2 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e6574c72c0 expref: -99 pid: 20727 timeout: 0 lvb_type: 0 Aug 27 10:59:37 fir-md1-s1 kernel: LustreError: 20727:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 27 10:59:43 fir-md1-s1 kernel: Lustre: 20211:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566928776/real 1566928776] req@ffff8f0707855a00 x1636782180409376/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 0 to 1 dl 1566928783 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 10:59:43 fir-md1-s1 kernel: Lustre: 20211:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 565 previous similar messages Aug 27 10:59:46 fir-md1-s1 kernel: LustreError: 21436:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928696, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3665696300/0x5d9ee6e6574d530d lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 38 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21436 timeout: 0 lvb_type: 0 Aug 27 10:59:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2c9ea0d7-9801-4a86-9152-4d85f70d5266 (at 10.8.12.25@o2ib6) reconnecting Aug 27 10:59:50 fir-md1-s1 kernel: Lustre: Skipped 1514 previous similar messages Aug 27 10:59:51 fir-md1-s1 kernel: LustreError: 97657:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928701, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2492468900/0x5d9ee6e6574e0b9f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 39 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97657 timeout: 0 lvb_type: 0 Aug 27 10:59:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Aug 27 10:59:57 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (7): c: 0, oc: 0, rc: 8 Aug 27 10:59:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 2 seconds Aug 27 10:59:58 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 5 previous similar messages Aug 27 10:59:59 fir-md1-s1 kernel: Lustre: 42521:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f23e5872400 x1634141554970384/t0(0) o103->459a4674-896d-e57f-5fbe-6e6932e88880@10.9.106.17@o2ib4:4/0 lens 328/0 e 1 to 0 dl 1566928804 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 10:59:59 fir-md1-s1 kernel: Lustre: 42521:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 74051 previous similar messages Aug 27 11:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:00:04 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 27 11:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 11:00:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 11:00:04 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 27 11:00:05 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-11s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f28e63e7200 x1642955208571968/t0(0) o103->f5ec0a6a-eeab-6471-6401-1b4f0dd33ce6@10.8.12.1@o2ib6:24/0 lens 328/0 e 0 to 0 dl 1566928794 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:00:05 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 13364 previous similar messages Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: Skipped 506 previous similar messages Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: 21462:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=4 reqQ=7541 recA=0, svcEst=20, delay=517 Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: 21462:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 519 previous similar messages Aug 27 11:00:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds Aug 27 11:00:08 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 31 previous similar messages Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: 25081:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:122s); client may timeout. req@ffff8f1dfa53b600 x1642476552109984/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:6/0 lens 336/0 e 1 to 0 dl 1566928686 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:00:08 fir-md1-s1 kernel: Lustre: 25081:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6471 previous similar messages Aug 27 11:00:09 fir-md1-s1 kernel: LustreError: 23587:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928719, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f36388b0d80/0x5d9ee6e65750f6d8 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 659 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23587 timeout: 0 lvb_type: 0 Aug 27 11:00:12 fir-md1-s1 kernel: LustreError: 22894:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.2.28@o2ib6: deadline 6:17s ago req@ffff8f2ab7ab9500 x1642950201150624/t0(0) o103->1f6170f1-7e4d-93e0-c338-52a60f322e56@10.8.2.28@o2ib6:25/0 lens 328/0 e 0 to 0 dl 1566928795 ref 1 fl Interpret:H/2/ffffffff rc 0/-1 Aug 27 11:00:12 fir-md1-s1 kernel: LustreError: 22894:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 112 previous similar messages Aug 27 11:00:13 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928723, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2483ee6300/0x5d9ee6e657513219 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 661 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97646 timeout: 0 lvb_type: 0 Aug 27 11:00:13 fir-md1-s1 kernel: LustreError: 97646:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 216 previous similar messages Aug 27 11:00:22 fir-md1-s1 kernel: LustreError: 20720:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928731, 91s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2483ee0fc0/0x5d9ee6e6575142d5 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 667 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20720 timeout: 0 lvb_type: 0 Aug 27 11:00:22 fir-md1-s1 kernel: LustreError: 20720:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 33 previous similar messages Aug 27 11:00:38 fir-md1-s1 kernel: LustreError: 21379:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928748, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f330a39f500/0x5d9ee6e6575193a2 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 667 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21379 timeout: 0 lvb_type: 0 Aug 27 11:00:38 fir-md1-s1 kernel: LustreError: 21379:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 25 previous similar messages Aug 27 11:00:59 fir-md1-s1 kernel: Lustre: 28232:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f38c0657850 x1642613908146672/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:00:59 fir-md1-s1 kernel: Lustre: 31000:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff8f18c1c39b00 x1642613907648352/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:00:59 fir-md1-s1 kernel: Lustre: 31000:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 302547 previous similar messages Aug 27 11:00:59 fir-md1-s1 kernel: Lustre: 28232:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 369 previous similar messages Aug 27 11:01:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 44s: evicting client at 10.9.109.15@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f1dc72b1440/0x5d9ee6e64f8d9fb1 lrc: 3/0,0 mode: PR/PR res: [0x2c002c73d:0xa871:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.109.15@o2ib4 remote: 0xc91531a730386720 expref: 2537 pid: 21461 timeout: 6043913 lvb_type: 0 Aug 27 11:01:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Aug 27 11:01:09 fir-md1-s1 kernel: LustreError: 20466:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f14766f7400 ns: mdt-fir-MDT0002_UUID lock: ffff8f43f720c5c0/0x5d9ee6e657523711 lrc: 3/0,0 mode: PW/PW res: [0x2c002bf6b:0xabb9:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.9.109.59@o2ib4 remote: 0xac528383b9d672a1 expref: 86 pid: 20466 timeout: 0 lvb_type: 0 Aug 27 11:01:09 fir-md1-s1 kernel: LustreError: 20466:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Aug 27 11:01:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 11:01:10 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 27 11:01:15 fir-md1-s1 kernel: LustreError: 23633:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928785, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0c594af500/0x5d9ee6e65751c6cb lrc: 3/1,0 mode: --/PR res: [0x200000401:0x6:0x0].0x0 bits 0x13/0x0 rrc: 639 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23633 timeout: 0 lvb_type: 0 Aug 27 11:01:15 fir-md1-s1 kernel: LustreError: 23633:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages Aug 27 11:01:22 fir-md1-s1 kernel: LustreError: 23708:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0e17dc9500 x1636782180476688/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:01:26 fir-md1-s1 kernel: LNet: Service thread pid 23454 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:01:26 fir-md1-s1 kernel: Pid: 23454, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:01:26 fir-md1-s1 kernel: Call Trace: Aug 27 11:01:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:01:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:01:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:01:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:01:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:01:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928886.23454 Aug 27 11:01:27 fir-md1-s1 kernel: LustreError: 20725:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1a57169200 x1636782180479280/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:01:27 fir-md1-s1 kernel: LustreError: 20725:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Aug 27 11:01:28 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:01:28 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 27 11:01:29 fir-md1-s1 kernel: LNet: Service thread pid 20727 was inactive for 201.75s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:01:29 fir-md1-s1 kernel: Pid: 20727, comm: mdt01_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:01:29 fir-md1-s1 kernel: Call Trace: Aug 27 11:01:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:01:29 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:01:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:01:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:01:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:01:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:01:37 fir-md1-s1 kernel: LNet: Service thread pid 21436 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:01:37 fir-md1-s1 kernel: Pid: 21436, comm: mdt03_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:01:37 fir-md1-s1 kernel: Call Trace: Aug 27 11:01:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:01:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:01:37 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 11:01:37 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:01:37 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:01:37 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:01:37 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:01:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:01:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:01:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:01:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:01:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:01:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:01:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928897.21436 Aug 27 11:01:41 fir-md1-s1 kernel: LNet: Service thread pid 97657 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:01:42 fir-md1-s1 kernel: Pid: 97657, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:01:42 fir-md1-s1 kernel: Call Trace: Aug 27 11:01:42 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:01:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:01:42 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 11:01:42 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:01:42 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:01:42 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:01:42 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:01:42 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:01:42 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:01:42 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:01:42 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:01:42 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:01:42 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to df993956-2257-9a73-35ef-341b2f75d156 (at 10.9.106.58@o2ib4) Aug 27 11:01:42 fir-md1-s1 kernel: Lustre: Skipped 4207 previous similar messages Aug 27 11:01:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928901.97657 Aug 27 11:01:45 fir-md1-s1 kernel: Pid: 22006, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:01:45 fir-md1-s1 kernel: Call Trace: Aug 27 11:01:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:01:45 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:01:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:01:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:01:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:01:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:01:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928905.22006 Aug 27 11:01:48 fir-md1-s1 kernel: LustreError: 20545:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f15060d8000 ns: mdt-fir-MDT0002_UUID lock: ffff8f1ad5221200/0x5d9ee6e657537c83 lrc: 3/0,0 mode: PR/PR res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x1b/0x0 rrc: 25 type: IBT flags: 0x50200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905af0edb975b expref: 31 pid: 20545 timeout: 0 lvb_type: 0 Aug 27 11:01:48 fir-md1-s1 kernel: LustreError: 20545:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 20 previous similar messages Aug 27 11:01:48 fir-md1-s1 kernel: LustreError: 24564:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2cdf344c50 x1642682252248304/t0(0) o3->a7a4f5ec-5701-92f8-07e6-1f17cc662929@10.8.11.29@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1566928938 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:01:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with a7a4f5ec-5701-92f8-07e6-1f17cc662929 (at 10.8.11.29@o2ib6), client will retry: rc -107 Aug 27 11:01:50 fir-md1-s1 kernel: LustreError: 44044:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1f804d4450 x1642950246885744/t0(0) o3->699c7076-7353-6d50-d7b8-a9fbda77aab1@10.8.13.2@o2ib6:19/0 lens 488/440 e 0 to 0 dl 1566928939 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:01:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 699c7076-7353-6d50-d7b8-a9fbda77aab1 (at 10.8.13.2@o2ib6), client will retry: rc -110 Aug 27 11:01:53 fir-md1-s1 kernel: LustreError: 46532:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f180b477c50 x1635363772474992/t0(0) o3->51002e48-a06e-3405-fcaa-ac377ed743af@10.8.17.9@o2ib6:22/0 lens 488/440 e 0 to 0 dl 1566928942 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:01:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 51002e48-a06e-3405-fcaa-ac377ed743af (at 10.8.17.9@o2ib6), client will retry: rc -107 Aug 27 11:01:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5271bf26-d8cc-fcfd-f77a-5ec5f97cfd6e (at 10.8.21.13@o2ib6) reconnecting Aug 27 11:01:58 fir-md1-s1 kernel: Lustre: Skipped 4342 previous similar messages Aug 27 11:01:58 fir-md1-s1 kernel: LNet: Service thread pid 20553 was inactive for 200.12s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:01:59 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 27 11:01:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928918.20553 Aug 27 11:02:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566928921.97639 Aug 27 11:02:04 fir-md1-s1 kernel: LustreError: 46536:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f180b476450 x1640712352809600/t0(0) o3->b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9@10.9.114.15@o2ib4:4/0 lens 488/440 e 0 to 0 dl 1566928954 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9 (at 10.9.114.15@o2ib4), client will retry: rc -107 Aug 27 11:02:06 fir-md1-s1 kernel: LustreError: 20720:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d1f99670bbf0 vs. last_xid 5d1f99670d70f req@ffff8f1e43293f00 x1638244784585712/t0(0) o101->8ec1acae-5541-1224-6330-34435f948ba9@10.9.106.61@o2ib4:6/0 lens 480/0 e 0 to 0 dl 1566928956 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:02:07 fir-md1-s1 kernel: Lustre: 31010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-28), not sending early reply req@ffff8f2e796fd450 x1638101766190304/t0(0) o103->f0a8fbb7-06c4-ed16-a94f-6cea310ceb29@10.8.0.82@o2ib6:11/0 lens 328/0 e 0 to 0 dl 1566928931 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:02:07 fir-md1-s1 kernel: Lustre: 31010:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 139270 previous similar messages Aug 27 11:02:10 fir-md1-s1 kernel: Lustre: 20210:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566928909/real 1566928909] req@ffff8f117b8b4e00 x1636782180406976/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566928930 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:02:10 fir-md1-s1 kernel: Lustre: 20210:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 465 previous similar messages Aug 27 11:02:13 fir-md1-s1 kernel: LustreError: 21540:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f20bc0a8450 x1640712352830192/t0(0) o3->b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9@10.9.114.15@o2ib4:11/0 lens 488/440 e 0 to 0 dl 1566928961 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9 (at 10.9.114.15@o2ib4), client will retry: rc -110 Aug 27 11:02:14 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-38s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f20bce6b300 x1642476559325280/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:6/0 lens 336/0 e 0 to 0 dl 1566928896 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:02:14 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 212669 previous similar messages Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: 31001:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:91s); client may timeout. req@ffff8f375c9aec50 x1642476563379744/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:15/0 lens 336/0 e 0 to 0 dl 1566928845 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: 31001:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 29513 previous similar messages Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: Skipped 4208 previous similar messages Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=93 reqQ=226568 recA=0, svcEst=20, delay=1070 Aug 27 11:02:16 fir-md1-s1 kernel: Lustre: 27440:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 4208 previous similar messages Aug 27 11:02:20 fir-md1-s1 kernel: LustreError: 46812:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 6:90s ago req@ffff8f1dd5efa700 x1642476563720512/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:20/0 lens 336/0 e 0 to 0 dl 1566928850 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:02:20 fir-md1-s1 kernel: LustreError: 46812:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 5826 previous similar messages Aug 27 11:02:26 fir-md1-s1 kernel: LustreError: 97600:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f344090b850 x1640712352847392/t0(0) o3->b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9@10.9.114.15@o2ib4:24/0 lens 488/440 e 0 to 0 dl 1566928974 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with b2b2e1ee-e104-0a5b-43d1-12a1f3714ec9 (at 10.9.114.15@o2ib4), client will retry: rc -110 Aug 27 11:02:28 fir-md1-s1 kernel: LustreError: 20541:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d5b356520e80 vs. last_xid 5d5b3565234cf req@ffff8f16cbd76000 x1642341107633792/t0(0) o101->f35471dc-4c42-bd06-27d8-a92f6bb41fe4@10.9.101.56@o2ib4:28/0 lens 480/0 e 0 to 0 dl 1566928978 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:02:37 fir-md1-s1 kernel: LustreError: 71835:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f087e7ae850 x1641788456459840/t0(0) o37->5d53e59f-af2d-163c-71c9-d97309c578be@10.9.109.11@o2ib4:7/0 lens 448/440 e 0 to 0 dl 1566928987 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:38 fir-md1-s1 kernel: LustreError: 21042:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f26bf00b050 x1642682254020016/t0(0) o3->a0a6b73c-2efb-3ccf-4782-75182c9f5b42@10.8.11.25@o2ib6:8/0 lens 488/440 e 0 to 0 dl 1566928988 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:40 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:02:41 fir-md1-s1 kernel: LustreError: 23599:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928871, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0c29958240/0x5d9ee6e65752c07f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 46 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23599 timeout: 0 lvb_type: 0 Aug 27 11:02:41 fir-md1-s1 kernel: LustreError: 23599:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 27 11:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c74cabd5-45b1-86e5-60f0-8f68b07a88b1 (at 10.9.103.24@o2ib4), client will retry: rc = -110 Aug 27 11:02:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 63a3f052-4761-28e2-a5a1-1e397d4515f3 (at 10.8.11.26@o2ib6), client will retry: rc -110 Aug 27 11:02:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 11:02:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a687dd21-1bbe-233b-d907-3cc9986eac5f (at 10.9.103.28@o2ib4), client will retry: rc = -110 Aug 27 11:02:50 fir-md1-s1 kernel: LustreError: 24571:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2ebce45c50 x1642950162415264/t0(0) o3->b58eab23-f4fb-887d-6cb1-84427d552134@10.8.12.10@o2ib6:18/0 lens 488/440 e 0 to 0 dl 1566928998 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1324ad2000, cur 1566928971 expire 1566928821 last 1566928744 Aug 27 11:02:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 11:02:55 fir-md1-s1 kernel: LustreError: 10149:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566928883, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f33b6555c40/0x5d9ee6e657544832 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65754485c expref: -99 pid: 10149 timeout: 0 lvb_type: 0 Aug 27 11:02:55 fir-md1-s1 kernel: LustreError: 10149:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 27 11:02:58 fir-md1-s1 kernel: LustreError: 21870:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f22c250f500 x1641925332921360/t0(0) o37->d8fb76db-f9d5-d4b5-1fe0-ea814c136f26@10.8.18.32@o2ib6:28/0 lens 448/440 e 0 to 0 dl 1566929008 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:02:58 fir-md1-s1 kernel: LustreError: 21870:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 5 previous similar messages Aug 27 11:02:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 96s: evicting client at 10.8.29.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f40b0ab3f00/0x5d9ee6e657536d1e lrc: 3/0,0 mode: PR/PR res: [0x20002a08d:0x637:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.8.29.8@o2ib6 remote: 0xfccaff92bb5569e0 expref: 6744 pid: 10148 timeout: 6043971 lvb_type: 0 Aug 27 11:02:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 62 previous similar messages Aug 27 11:02:59 fir-md1-s1 kernel: LustreError: 23705:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f36ebc06c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2efa0e8900/0x5d9ee6e657541812 lrc: 3/0,0 mode: PW/PW res: [0x2c002cc9c:0x47af:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.104.57@o2ib4 remote: 0x686f4d54c1a25e51 expref: 11 pid: 23705 timeout: 0 lvb_type: 0 Aug 27 11:02:59 fir-md1-s1 kernel: LustreError: 23705:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Aug 27 11:03:07 fir-md1-s1 kernel: LustreError: 22280:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f17283ccb00 x1636782180533216/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:03:14 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1286e39c50 x1641917136738496/t0(0) o3->211c4417-cd43-2e0b-9f03-69995281dc54@10.9.104.1@o2ib4:14/0 lens 488/440 e 0 to 0 dl 1566929024 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:03:14 fir-md1-s1 kernel: LustreError: 22430:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 1 previous similar message Aug 27 11:03:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 11:03:19 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Aug 27 11:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d76fd0e2-c0e8-e1af-41b7-af513684736a (at 10.9.108.32@o2ib4), client will retry: rc = -110 Aug 27 11:03:30 fir-md1-s1 kernel: LustreError: 49463:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f29d3240050 x1642676852775344/t0(0) o3->ed50a6b6-7be7-e85d-35dc-d25d8404b3f4@10.9.106.71@o2ib4:29/0 lens 488/440 e 0 to 0 dl 1566929039 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:03:30 fir-md1-s1 kernel: LustreError: 49463:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 17 previous similar messages Aug 27 11:03:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ed50a6b6-7be7-e85d-35dc-d25d8404b3f4 (at 10.9.106.71@o2ib4), client will retry: rc -110 Aug 27 11:03:30 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 27 11:03:36 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:03:36 fir-md1-s1 kernel: LustreError: 21294:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3248b60050 x1634070032931184/t0(0) o3->bf4e4c3f-5edc-0503-2f79-5f56c2bc374a@10.9.102.49@o2ib4:5/0 lens 488/440 e 0 to 0 dl 1566929045 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:03:36 fir-md1-s1 kernel: LustreError: 21294:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 1 previous similar message Aug 27 11:03:37 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:03:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 27 11:03:59 fir-md1-s1 kernel: LustreError: 23704:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2f2b89d400 x1636782180559536/t0(0) o104->fir-MDT0002@10.8.0.68@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:03:59 fir-md1-s1 kernel: LustreError: 23704:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Aug 27 11:04:07 fir-md1-s1 kernel: LNet: Service thread pid 25681 was inactive for 200.43s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:04:07 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 27 11:04:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929047.25681 Aug 27 11:04:11 fir-md1-s1 kernel: LustreError: 46533:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1667d2ac50 x1638097061147088/t0(0) o3->60b13f2e-973a-45ed-69c0-02ff0226cf53@10.9.106.51@o2ib4:10/0 lens 488/440 e 0 to 0 dl 1566929080 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:04:11 fir-md1-s1 kernel: LustreError: 46533:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 3 previous similar messages Aug 27 11:04:23 fir-md1-s1 kernel: LustreError: 10506:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbed0571bee0 vs. last_xid 5cbed0571cb6f req@ffff8f2a97f51800 x1631593742581472/t0(0) o36->fb8f22c1-ceb3-fa39-aea4-695a494d32c5@10.9.101.26@o2ib4:22/0 lens 488/0 e 0 to 0 dl 1566929092 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:04:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929071.23599 Aug 27 11:04:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 202038e6-756c-60d5-8477-3f096271d9f8 (at 10.9.113.9@o2ib4), client will retry: rc -107 Aug 27 11:04:35 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Aug 27 11:04:35 fir-md1-s1 kernel: LustreError: 27582:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f23ad7f8050 x1642682254238672/t0(0) o3->e1b48ba1-0c6b-a5b4-53d8-aaf71a91a4ec@10.8.11.32@o2ib6:4/0 lens 488/440 e 0 to 0 dl 1566929104 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:04:35 fir-md1-s1 kernel: LustreError: 27582:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 37 previous similar messages Aug 27 11:04:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929082.23562 Aug 27 11:04:44 fir-md1-s1 kernel: LNet: Service thread pid 23588 completed after 200.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:04:44 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 27 11:04:46 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:04:46 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 27 11:04:50 fir-md1-s1 kernel: LustreError: 23558:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566929000, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f08e83c6780/0x5d9ee6e6576f6824 lrc: 3/0,1 mode: --/EX res: [0x200029851:0x2ab2:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23558 timeout: 0 lvb_type: 0 Aug 27 11:04:50 fir-md1-s1 kernel: LustreError: 23558:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 82 previous similar messages Aug 27 11:04:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) in 305 seconds. I think it's dead, and I am evicting it. exp ffff8f0a40167400, cur 1566929093 expire 1566928943 last 1566928788 Aug 27 11:05:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with ef5fd4bc-3ade-f022-c480-f42cc4ae70e5 (at 10.9.105.22@o2ib4), client will retry: rc = -110 Aug 27 11:05:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 11:05:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929106.20464 Aug 27 11:05:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 126s: evicting client at 10.8.25.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f37382fd100/0x5d9ee6e6575370d6 lrc: 3/0,0 mode: PR/PR res: [0x20002a29c:0x324:0x0].0x0 bits 0x59/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.25.23@o2ib6 remote: 0x3e761dc7e53c28e4 expref: 16859 pid: 23761 timeout: 6044070 lvb_type: 0 Aug 27 11:05:08 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 26 previous similar messages Aug 27 11:05:11 fir-md1-s1 kernel: LustreError: 10308:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2522eb6000 ns: mdt-fir-MDT0000_UUID lock: ffff8f123b0cc800/0x5d9ee6e657900d7c lrc: 3/0,0 mode: PW/PW res: [0x200029851:0x2ab2:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.9.106.47@o2ib4 remote: 0x9545d26475ce7d1e expref: 19 pid: 10308 timeout: 0 lvb_type: 0 Aug 27 11:05:11 fir-md1-s1 kernel: LustreError: 10308:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 21 previous similar messages Aug 27 11:05:15 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff8f2b11d3e300 x1642590910557856/t0(0) o103->7364db02-d721-2d93-6c8c-160bd144c738@10.9.106.52@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:05:15 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 454552 previous similar messages Aug 27 11:05:18 fir-md1-s1 kernel: LustreError: 23642:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f26e48fd100 x1636782180589808/t0(0) o104->fir-MDT0002@10.8.0.67@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:05:18 fir-md1-s1 kernel: LustreError: 23642:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Aug 27 11:05:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:05:20 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 11:05:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 23504e9e-38b0-73ab-6845-a2f9362c9ca3 (at 10.8.29.7@o2ib6), client will retry: rc = -110 Aug 27 11:05:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 11:05:31 fir-md1-s1 kernel: LustreError: 46589:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f20287bb450 x1642682249345632/t0(0) o3->2d8766b5-1393-f9e3-29ee-3b2281baed8f@10.8.11.33@o2ib6:0/0 lens 488/440 e 0 to 0 dl 1566929160 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:05:31 fir-md1-s1 kernel: LustreError: 46589:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 8 previous similar messages Aug 27 11:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to aca88a5d-734b-f4a5-55fa-0e35d21bcb4e (at 10.8.0.65@o2ib6) Aug 27 11:05:57 fir-md1-s1 kernel: Lustre: Skipped 9500 previous similar messages Aug 27 11:05:59 fir-md1-s1 kernel: LustreError: 21681:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566929067, 92s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f3868c4b3c0/0x5d9ee6e65787ee44 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65787f0e4 expref: -99 pid: 21681 timeout: 0 lvb_type: 0 Aug 27 11:05:59 fir-md1-s1 kernel: LustreError: 21681:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 27 11:06:05 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:06:05 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 11:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST001b_UUID (at 10.0.10.106@o2ib7) reconnecting Aug 27 11:06:14 fir-md1-s1 kernel: Lustre: Skipped 9078 previous similar messages Aug 27 11:06:18 fir-md1-s1 kernel: LNet: Service thread pid 23573 was inactive for 200.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:06:18 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Aug 27 11:06:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929178.23573 Aug 27 11:06:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929180.23575 Aug 27 11:06:23 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f24b455fb00 x1642476563417104/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:28/0 lens 336/0 e 1 to 0 dl 1566929188 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:06:23 fir-md1-s1 kernel: Lustre: 28235:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 588599 previous similar messages Aug 27 11:06:27 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566929180/real 1566929180] req@ffff8f172f6d2100 x1636782180522160/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 0 to 1 dl 1566929187 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:06:27 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4934 previous similar messages Aug 27 11:06:29 fir-md1-s1 kernel: LNet: Service thread pid 23680 was inactive for 206.87s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:06:29 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 27 11:06:29 fir-md1-s1 kernel: Pid: 23680, comm: mdt03_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:06:29 fir-md1-s1 kernel: Call Trace: Aug 27 11:06:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:06:29 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:06:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:06:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:06:29 fir-md1-s1 kernel: Pid: 22280, comm: mdt01_042 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:06:29 fir-md1-s1 kernel: Call Trace: Aug 27 11:06:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:06:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:06:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:06:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:06:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:06:33 fir-md1-s1 kernel: Lustre: 20930:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-161s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f226f6fd850 x1642476566730656/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:19/0 lens 336/0 e 0 to 0 dl 1566929029 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:06:33 fir-md1-s1 kernel: Lustre: 20930:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 261498 previous similar messages Aug 27 11:06:33 fir-md1-s1 kernel: Lustre: 31005:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:212s); client may timeout. req@ffff8f2ed3ff4800 x1642613911974512/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:1/0 lens 328/0 e 0 to 0 dl 1566928981 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:06:33 fir-md1-s1 kernel: Lustre: 31005:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 145047 previous similar messages Aug 27 11:06:38 fir-md1-s1 kernel: Pid: 10151, comm: mdt02_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: Skipped 5212 previous similar messages Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: 21738:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=91 reqQ=714717 recA=0, svcEst=20, delay=12220 Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: 21738:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 5212 previous similar messages Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 7372bd8e-4f77-9af0-e0f4-c1915e510b36 (at 10.9.103.22@o2ib4), client will retry: rc = -110 Aug 27 11:06:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 27 11:06:38 fir-md1-s1 kernel: Call Trace: Aug 27 11:06:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:06:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:06:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:06:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:06:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:06:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929193.10151 Aug 27 11:06:38 fir-md1-s1 kernel: LustreError: 46813:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 20:224s ago req@ffff8f1b0faf7200 x1642476557163840/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:22/0 lens 336/0 e 1 to 0 dl 1566928972 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:06:38 fir-md1-s1 kernel: LustreError: 46813:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 22016 previous similar messages Aug 27 11:06:41 fir-md1-s1 kernel: LustreError: 23749:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2eb61ba400 x1636782180618208/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:06:41 fir-md1-s1 kernel: LustreError: 23749:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Aug 27 11:06:42 fir-md1-s1 kernel: LNet: Service thread pid 24579 was inactive for 205.09s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:06:42 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 27 11:06:42 fir-md1-s1 kernel: Pid: 24579, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:06:42 fir-md1-s1 kernel: Call Trace: Aug 27 11:06:42 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:06:42 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:06:42 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:06:42 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:06:42 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:06:42 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:06:42 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:06:42 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:06:42 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:06:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6), client will retry: rc -107 Aug 27 11:06:51 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Aug 27 11:06:59 fir-md1-s1 kernel: LustreError: 21882:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f19cfcb1e00 x1641917756145248/t0(0) o37->cdcfef42-b2ce-c8bc-b24e-50451baaa8f8@10.9.104.28@o2ib4:18/0 lens 448/440 e 0 to 0 dl 1566929238 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:06:59 fir-md1-s1 kernel: LustreError: 21882:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 47 previous similar messages Aug 27 11:06:59 fir-md1-s1 kernel: LNet: Service thread pid 26256 was inactive for 201.83s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:06:59 fir-md1-s1 kernel: Pid: 26256, comm: mdt01_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:06:59 fir-md1-s1 kernel: Call Trace: Aug 27 11:06:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:06:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:06:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:06:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:06:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:06:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929218.26256 Aug 27 11:07:02 fir-md1-s1 kernel: LNet: Service thread pid 24579 completed after 226.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:07:02 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 11:07:13 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:07:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929237.97638 Aug 27 11:07:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929240.23704 Aug 27 11:07:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929244.20465 Aug 27 11:07:24 fir-md1-s1 kernel: LNet: Service thread pid 23571 completed after 228.31s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:07:24 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 27 11:07:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929258.10305 Aug 27 11:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 11:07:38 fir-md1-s1 kernel: Lustre: Skipped 140 previous similar messages Aug 27 11:07:39 fir-md1-s1 kernel: LNet: Service thread pid 26256 completed after 242.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:07:45 fir-md1-s1 kernel: LNet: Service thread pid 50444 completed after 247.71s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:07:45 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 27 11:07:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929267.21681 Aug 27 11:07:53 fir-md1-s1 kernel: LNet: Service thread pid 97657 completed after 571.80s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:07:53 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 27 11:07:59 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:07:59 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 27 11:08:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929287.21074 Aug 27 11:08:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929293.26258 Aug 27 11:08:27 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Aug 27 11:08:27 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (6): c: 0, oc: 1, rc: 8 Aug 27 11:08:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929308.21410 Aug 27 11:08:28 fir-md1-s1 kernel: LNet: Service thread pid 23617 completed after 291.14s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:08:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 2 seconds Aug 27 11:08:29 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 7 previous similar messages Aug 27 11:08:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929314.10332 Aug 27 11:08:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929315.23586 Aug 27 11:08:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929319.23642 Aug 27 11:08:42 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Aug 27 11:08:42 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (5): c: 0, oc: 0, rc: 8 Aug 27 11:08:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929332.21675 Aug 27 11:09:02 fir-md1-s1 kernel: LNet: Service thread pid 23625 completed after 323.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:09:02 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 11:09:06 fir-md1-s1 kernel: LustreError: 23579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566929256, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f393307e540/0x5d9ee6e657b47745 lrc: 3/1,0 mode: --/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 733 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23579 timeout: 0 lvb_type: 0 Aug 27 11:09:06 fir-md1-s1 kernel: LustreError: 23579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 169 previous similar messages Aug 27 11:09:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929347.23645 Aug 27 11:09:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929363.23574 Aug 27 11:09:24 fir-md1-s1 kernel: LustreError: 23598:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f05575b1400 ns: mdt-fir-MDT0002_UUID lock: ffff8f0dca01b180/0x5d9ee6e657b49353 lrc: 3/0,0 mode: PR/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1866 type: IBT flags: 0x50200000000000 nid: 10.9.102.48@o2ib4 remote: 0x9b9e02b91f130144 expref: 10 pid: 23598 timeout: 0 lvb_type: 0 Aug 27 11:09:24 fir-md1-s1 kernel: LustreError: 23598:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 31 previous similar messages Aug 27 11:09:28 fir-md1-s1 kernel: LustreError: 71821:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f4175b95400 x1638292290536912/t0(0) o37->ef0748a0-58bc-3624-ed96-74860cd1e591@10.8.0.66@o2ib6:25/0 lens 448/440 e 0 to 0 dl 1566929395 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:09:28 fir-md1-s1 kernel: LustreError: 71821:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 12 previous similar messages Aug 27 11:09:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929369.23713 Aug 27 11:09:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4), client will retry: rc = -110 Aug 27 11:09:48 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:09:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 34s: evicting client at 10.9.102.34@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f0b1eee8900/0x5d9ee6e64dc1d66b lrc: 3/0,0 mode: PR/PR res: [0x200029a60:0x3ff4:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.102.34@o2ib4 remote: 0x43472c59ff7dbbdd expref: 499 pid: 10589 timeout: 6044454 lvb_type: 0 Aug 27 11:09:59 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 47 previous similar messages Aug 27 11:10:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929406.20467 Aug 27 11:10:18 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f291aa1d800 Aug 27 11:10:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929423.23618 Aug 27 11:10:27 fir-md1-s1 kernel: LustreError: 20187:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f313c22d400 Aug 27 11:10:28 fir-md1-s1 kernel: LNet: Service thread pid 21370 completed after 222.17s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:10:28 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Aug 27 11:10:31 fir-md1-s1 kernel: LustreError: 21669:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0fe1d5dd00 x1636782180713568/t0(0) o104->fir-MDT0000@10.9.108.61@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:10:31 fir-md1-s1 kernel: LustreError: 21669:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Aug 27 11:10:47 fir-md1-s1 kernel: LNet: Service thread pid 23744 was inactive for 200.07s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:10:47 fir-md1-s1 kernel: LNet: Skipped 57 previous similar messages Aug 27 11:10:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929447.23744 Aug 27 11:10:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929453.23723 Aug 27 11:11:19 fir-md1-s1 kernel: LustreError: 52410:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f0d2ee92450 x1631569921305696/t0(0) o4->25c05458-1ff8-5b3c-505b-360943a414ba@10.9.104.66@o2ib4:18/0 lens 488/448 e 0 to 0 dl 1566929508 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:11:19 fir-md1-s1 kernel: LustreError: 52410:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 31 previous similar messages Aug 27 11:11:27 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 228 seconds. I think it's dead, and I am evicting it. exp ffff8f18d6a03400, cur 1566929487 expire 1566929337 last 1566929259 Aug 27 11:12:00 fir-md1-s1 kernel: LustreError: 23641:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566929430, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f4078240480/0x5d9ee6e657c97569 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e657c97da4 expref: -99 pid: 23641 timeout: 0 lvb_type: 0 Aug 27 11:12:45 fir-md1-s1 kernel: LNet: Service thread pid 21678 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:12:45 fir-md1-s1 kernel: Pid: 21678, comm: mdt03_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:12:45 fir-md1-s1 kernel: Call Trace: Aug 27 11:12:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:12:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:12:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:12:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:12:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:12:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929565.21678 Aug 27 11:12:47 fir-md1-s1 kernel: Pid: 97657, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:12:47 fir-md1-s1 kernel: Call Trace: Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:12:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:12:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:12:47 fir-md1-s1 kernel: Pid: 27320, comm: mdt00_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:12:47 fir-md1-s1 kernel: Call Trace: Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:12:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:12:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:12:47 fir-md1-s1 kernel: Pid: 20460, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:12:47 fir-md1-s1 kernel: Call Trace: Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:12:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:12:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:12:47 fir-md1-s1 kernel: Pid: 23742, comm: mdt02_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:12:47 fir-md1-s1 kernel: Call Trace: Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:12:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:12:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:12:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:12:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:12:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929568.23594 Aug 27 11:12:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929571.10253 Aug 27 11:12:53 fir-md1-s1 kernel: LNet: Service thread pid 20464 completed after 670.29s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:12:53 fir-md1-s1 kernel: LNet: Skipped 19 previous similar messages Aug 27 11:13:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929585.24584 Aug 27 11:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 85739f26-323d-803b-2431-48193ac3fda7 (at 10.8.2.9@o2ib6), client will retry: rc -107 Aug 27 11:13:05 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Aug 27 11:13:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929593.23697 Aug 27 11:13:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with a687dd21-1bbe-233b-d907-3cc9986eac5f (at 10.9.103.28@o2ib4), client will retry: rc = -110 Aug 27 11:13:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 11:13:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929598.23571 Aug 27 11:13:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929600.23572 Aug 27 11:13:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929603.23685 Aug 27 11:13:41 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 11:13:41 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (5): c: 0, oc: 0, rc: 8 Aug 27 11:13:41 fir-md1-s1 kernel: LustreError: 52410:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk WRITE req@ffff8f0dba9e7050 x1631568524867328/t0(0) o4->611b4745-d6b8-5f49-019a-332c1d4de3e3@10.9.106.21@o2ib4:5/0 lens 488/448 e 0 to 0 dl 1566929645 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:13:41 fir-md1-s1 kernel: LustreError: 52410:0:(ldlm_lib.c:3252:target_bulk_io()) Skipped 3 previous similar messages Aug 27 11:13:42 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds Aug 27 11:13:42 fir-md1-s1 kernel: LNetError: 25680:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.26.26@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:13:42 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 62 previous similar messages Aug 27 11:13:43 fir-md1-s1 kernel: LNetError: 20464:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.12.10@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:13:43 fir-md1-s1 kernel: LNetError: 20464:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 29 previous similar messages Aug 27 11:13:43 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 3, status -103, desc ffff8f0c29397800 Aug 27 11:13:43 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 3, status -103, desc ffff8f0c29394600 Aug 27 11:13:43 fir-md1-s1 kernel: LustreError: 35230:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8f1286e38450 x1631575438356384/t0(0) o4->bd9d34e2-75c4-4164-e97b-d054cfaf6bb8@10.9.105.26@o2ib4:5/0 lens 488/448 e 0 to 0 dl 1566929645 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:13:44 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f1b8d95e000 Aug 27 11:13:44 fir-md1-s1 kernel: LNetError: 23616:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.24.31@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:13:44 fir-md1-s1 kernel: LNetError: 23616:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 55 previous similar messages Aug 27 11:13:44 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28607d9600 Aug 27 11:13:44 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f28607d8600 Aug 27 11:13:47 fir-md1-s1 kernel: Lustre: 27442:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 169s req@ffff8f22b0a5a100 x1642476568568480/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:13:47 fir-md1-s1 kernel: Lustre: 27442:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1549193 previous similar messages Aug 27 11:13:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929630.23621 Aug 27 11:13:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929632.23636 Aug 27 11:13:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929635.21667 Aug 27 11:14:10 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 11:14:10 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Aug 27 11:14:10 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.108@o2ib7 (7): c: 3, oc: 0, rc: 8 Aug 27 11:14:10 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Aug 27 11:14:11 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f18af71b450 x1642950252871232/t0(0) o3->8421e80b-bfe6-f745-2e6f-3a6adf378e7d@10.8.12.16@o2ib6:4/0 lens 488/440 e 0 to 0 dl 1566929674 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:14:11 fir-md1-s1 kernel: LustreError: 21710:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 27 previous similar messages Aug 27 11:14:15 fir-md1-s1 kernel: LustreError: 20992:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+10s req@ffff8f20be6a8000 x1636254908482592/t0(0) o37->09a7c9a8-33d7-6407-c049-3280a0ca2983@10.9.106.46@o2ib4:5/0 lens 448/440 e 0 to 0 dl 1566929645 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:14:15 fir-md1-s1 kernel: LustreError: 20992:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message Aug 27 11:14:20 fir-md1-s1 kernel: LustreError: 21900:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 31+8s req@ffff8f26ade97500 x1642590911028224/t0(0) o37->7364db02-d721-2d93-6c8c-160bd144c738@10.9.106.52@o2ib4:12/0 lens 448/440 e 0 to 0 dl 1566929652 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:14:28 fir-md1-s1 kernel: LustreError: 55539:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+0s req@ffff8f2c4e9d6850 x1639158801294464/t0(0) o256->44de373f-6d8e-55b1-ac31-0402f8221bc6@10.9.104.25@o2ib4:28/0 lens 304/240 e 1 to 0 dl 1566929668 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:14:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 8285cc10-76a0-d702-553d-5ac452bcb98d (at 10.9.102.44@o2ib4) Aug 27 11:14:30 fir-md1-s1 kernel: Lustre: Skipped 20030 previous similar messages Aug 27 11:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 52c71c06-aadc-cfcc-938a-f9d3fb657c1b (at 10.9.108.45@o2ib4) reconnecting Aug 27 11:14:46 fir-md1-s1 kernel: Lustre: Skipped 19585 previous similar messages Aug 27 11:14:53 fir-md1-s1 kernel: LustreError: 21412:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f30aab1e300 x1636782180947456/t0(0) o104->fir-MDT0002@10.9.0.64@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:14:53 fir-md1-s1 kernel: LustreError: 21412:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Aug 27 11:14:55 fir-md1-s1 kernel: Lustre: 30991:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f21b574dd00 x1642950265364752/t0(0) o103->d82284fc-44dc-9617-a284-754474e2e00e@10.8.2.16@o2ib6:0/0 lens 328/0 e 0 to 0 dl 1566929700 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:14:55 fir-md1-s1 kernel: Lustre: 30991:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2234570 previous similar messages Aug 27 11:15:02 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-200s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f07d1da8000 x1642476559530048/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:12/0 lens 336/0 e 0 to 0 dl 1566929502 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:15:02 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 592355 previous similar messages Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 20217:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566929684/real 1566929684] req@ffff8f2ad2f7cb00 x1636782180535104/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566929705 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 36768:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1025s); client may timeout. req@ffff8f0f8a01b600 x1642476566958816/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 1 to 0 dl 1566928680 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 36768:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 859398 previous similar messages Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 20217:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5483 previous similar messages Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 33422:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=3 reqQ=185596 recA=0, svcEst=20, delay=0 Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: 33422:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 17480 previous similar messages Aug 27 11:15:05 fir-md1-s1 kernel: Lustre: Skipped 17491 previous similar messages Aug 27 11:15:09 fir-md1-s1 kernel: LustreError: 20369:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.0.68@o2ib6: deadline 6:728s ago req@ffff8f1789181200 x1642705033151024/t0(0) o103->@:1/0 lens 328/0 e 0 to 0 dl 1566928981 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 11:15:09 fir-md1-s1 kernel: LustreError: 20369:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 52696 previous similar messages Aug 27 11:15:37 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:15:37 fir-md1-s1 kernel: LNetError: 20194:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 8 previous similar messages Aug 27 11:16:06 fir-md1-s1 kernel: LustreError: 71847:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f0b33734050 x1634198384053936/t0(0) o37->2d756374-54f3-168c-5d53-2ddb4062024e@10.9.109.55@o2ib4:17/0 lens 448/440 e 0 to 0 dl 1566929777 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:16:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.2.22@o2ib6, removing former export from same NID Aug 27 11:16:10 fir-md1-s1 kernel: Lustre: Skipped 507 previous similar messages Aug 27 11:16:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 11:16:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Aug 27 11:16:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.108@o2ib7 (6): c: 4, oc: 0, rc: 8 Aug 27 11:16:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Aug 27 11:16:21 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.108@o2ib7: 32 seconds Aug 27 11:16:21 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 83 previous similar messages Aug 27 11:16:22 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f34524fca00 Aug 27 11:16:22 fir-md1-s1 kernel: LustreError: 20185:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f34524fb000 Aug 27 11:16:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929785.21332 Aug 27 11:16:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929787.24579 Aug 27 11:16:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929794.23608 Aug 27 11:16:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929801.20468 Aug 27 11:16:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929803.97666 Aug 27 11:16:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929805.20983 Aug 27 11:16:46 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:16:46 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Aug 27 11:16:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929806.25677 Aug 27 11:16:53 fir-md1-s1 kernel: LustreError: 27058:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+5s req@ffff8f1d9592ef00 x1641926063832992/t0(0) o37->2f5061ce-64e2-39d7-9721-0145f3e60db9@10.9.107.20@o2ib4:18/0 lens 448/440 e 0 to 0 dl 1566929808 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:16:53 fir-md1-s1 kernel: LustreError: 27058:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 15 previous similar messages Aug 27 11:17:03 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Aug 27 11:17:03 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Aug 27 11:17:03 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (5): c: 0, oc: 0, rc: 8 Aug 27 11:17:03 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Aug 27 11:17:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929864.20541 Aug 27 11:17:57 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566929783, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2a95701b00/0x5d9ee6e658006ac8 lrc: 3/0,1 mode: --/PW res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x40/0x0 rrc: 63 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23746 timeout: 0 lvb_type: 0 Aug 27 11:17:57 fir-md1-s1 kernel: LustreError: 23746:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 868 previous similar messages Aug 27 11:18:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 01c5290e-2f99-d714-0fa9-403481192ee7 (at 10.9.103.1@o2ib4), client will retry: rc = -110 Aug 27 11:18:00 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 27 11:18:12 fir-md1-s1 kernel: LustreError: 23598:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4518b37c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f13a7fe6780/0x5d9ee6e658179cd5 lrc: 3/0,0 mode: PW/PW res: [0x200029918:0x9fc:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.105.70@o2ib4 remote: 0x4e970e92d11a547d expref: 2773 pid: 23598 timeout: 0 lvb_type: 0 Aug 27 11:18:12 fir-md1-s1 kernel: LustreError: 23598:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 131 previous similar messages Aug 27 11:18:25 fir-md1-s1 kernel: LustreError: 20999:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2723a04b00 x1642590911164448/t0(0) o37->7364db02-d721-2d93-6c8c-160bd144c738@10.9.106.52@o2ib4:14/0 lens 448/440 e 0 to 0 dl 1566929924 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:18:38 fir-md1-s1 kernel: LustreError: 23740:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d645e2447a60 vs. last_xid 5d645e2447c2f req@ffff8f2bdd71f200 x1642970520779360/t0(0) o36->1ad32d15-2c38-f5f1-9134-0bc2f5f63fbe@10.9.103.23@o2ib4:8/0 lens 488/0 e 0 to 0 dl 1566929948 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:18:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.115.12@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:18:55 fir-md1-s1 kernel: LNet: Service thread pid 23626 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:18:55 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 11:18:55 fir-md1-s1 kernel: Pid: 23626, comm: mdt03_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:18:55 fir-md1-s1 kernel: Call Trace: Aug 27 11:18:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:18:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:18:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:18:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:18:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:18:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929935.23626 Aug 27 11:19:00 fir-md1-s1 kernel: LustreError: 20721:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbe1900d7bc0 vs. last_xid 5cbe1900d7bcf req@ffff8f1a09698900 x1631544528436160/t0(0) o36->1ae7de3e-f83c-4930-305c-63330132f512@10.9.107.60@o2ib4:0/0 lens 488/0 e 0 to 0 dl 1566929970 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:19:01 fir-md1-s1 kernel: Pid: 10305, comm: mdt00_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:19:01 fir-md1-s1 kernel: Call Trace: Aug 27 11:19:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:19:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:19:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:19:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:19:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:19:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929941.10305 Aug 27 11:19:03 fir-md1-s1 kernel: Pid: 23709, comm: mdt03_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:19:03 fir-md1-s1 kernel: Call Trace: Aug 27 11:19:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:19:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:19:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:19:03 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:19:03 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:19:03 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:19:03 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:19:03 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:19:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:19:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:19:04 fir-md1-s1 kernel: Pid: 20996, comm: mdt02_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:19:04 fir-md1-s1 kernel: Call Trace: Aug 27 11:19:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:19:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:19:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:19:04 fir-md1-s1 kernel: Pid: 27318, comm: mdt02_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:19:04 fir-md1-s1 kernel: Call Trace: Aug 27 11:19:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:19:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:19:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:19:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:19:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:19:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929954.10582 Aug 27 11:19:37 fir-md1-s1 kernel: LustreError: 21873:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+7s req@ffff8f23115aad00 x1639310164185888/t0(0) o37->e3e47ab7-d323-84f4-e101-79c91130f0fa@10.9.116.3@o2ib4:0/0 lens 448/440 e 0 to 0 dl 1566929970 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:19:37 fir-md1-s1 kernel: LustreError: 21873:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Aug 27 11:19:43 fir-md1-s1 kernel: LNet: Service thread pid 21410 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:19:43 fir-md1-s1 kernel: LNet: Skipped 73 previous similar messages Aug 27 11:19:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566929983.21410 Aug 27 11:19:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 71s: evicting client at 10.9.106.31@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f4256c9aac0/0x5d9ee6e658237866 lrc: 3/0,0 mode: CR/CR res: [0x2c002cd46:0x2:0x0].0x0 bits 0x9/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.106.31@o2ib4 remote: 0x126e52e885f6c27a expref: 25 pid: 23717 timeout: 6045014 lvb_type: 0 Aug 27 11:19:56 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 112 previous similar messages Aug 27 11:20:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f448226cc00, cur 1566930000 expire 1566929850 last 1566929773 Aug 27 11:20:00 fir-md1-s1 kernel: LustreError: 71888:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f4505a1b000 x1634183247993696/t0(0) o37->9871df44-3e50-912f-f998-77063c2447b4@10.9.109.16@o2ib4:0/0 lens 448/440 e 0 to 0 dl 1566930030 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:20:00 fir-md1-s1 kernel: LustreError: 71888:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 237 previous similar messages Aug 27 11:20:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930014.50579 Aug 27 11:20:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930022.23583 Aug 27 11:20:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930024.23582 Aug 27 11:20:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930034.97644 Aug 27 11:20:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930047.20555 Aug 27 11:20:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930048.23666 Aug 27 11:20:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930049.23656 Aug 27 11:20:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930050.24577 Aug 27 11:20:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930056.23700 Aug 27 11:20:59 fir-md1-s1 kernel: LNet: Service thread pid 23661 completed after 210.36s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:20:59 fir-md1-s1 kernel: LNet: Skipped 51 previous similar messages Aug 27 11:20:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930059.23587 Aug 27 11:21:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930105.23589 Aug 27 11:21:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930111.21416 Aug 27 11:22:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 6f41efb2-daa5-aa06-2c6d-b61d2b47cb3d (at 10.9.114.13@o2ib4), client will retry: rc -107 Aug 27 11:22:00 fir-md1-s1 kernel: Lustre: Skipped 83 previous similar messages Aug 27 11:22:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930122.97646 Aug 27 11:22:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930143.10150 Aug 27 11:22:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930145.23694 Aug 27 11:22:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930150.23723 Aug 27 11:23:16 fir-md1-s1 kernel: LustreError: 49248:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1443dad050 x1641916374786800/t0(0) o3->fc4076f1-0bfd-2b0a-8710-4b2a9ef8582d@10.9.104.3@o2ib4:16/0 lens 488/440 e 0 to 0 dl 1566930226 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:23:16 fir-md1-s1 kernel: LustreError: 49248:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 36 previous similar messages Aug 27 11:23:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930196.20720 Aug 27 11:23:27 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3e484a2e00 Aug 27 11:23:47 fir-md1-s1 kernel: Lustre: 22136:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 20s req@ffff8f13a7544e00 x1642476566694320/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:23:47 fir-md1-s1 kernel: Lustre: 22136:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3009438 previous similar messages Aug 27 11:23:57 fir-md1-s1 kernel: LustreError: 21918:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+1s req@ffff8f0b7490bc00 x1642677036212912/t0(0) o37->102b8c35-a71b-64de-95bf-d6c1350f6af0@10.9.106.25@o2ib4:26/0 lens 448/440 e 0 to 0 dl 1566930236 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:24:02 fir-md1-s1 kernel: LNet: Service thread pid 21676 was inactive for 200.65s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:24:02 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 11:24:02 fir-md1-s1 kernel: Pid: 21676, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:24:03 fir-md1-s1 kernel: Call Trace: Aug 27 11:24:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:24:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:24:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:24:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:24:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:24:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930242.21676 Aug 27 11:24:06 fir-md1-s1 kernel: Pid: 20466, comm: mdt03_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:24:06 fir-md1-s1 kernel: Call Trace: Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:24:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:24:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:24:06 fir-md1-s1 kernel: Pid: 21483, comm: mdt01_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:24:06 fir-md1-s1 kernel: Call Trace: Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:24:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:24:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:24:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:24:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 29b52eb8-dab6-4b88-7a0d-057d59d63b47 (at 10.8.17.22@o2ib6) Aug 27 11:24:30 fir-md1-s1 kernel: Lustre: Skipped 27771 previous similar messages Aug 27 11:24:35 fir-md1-s1 kernel: Pid: 21679, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:24:35 fir-md1-s1 kernel: Call Trace: Aug 27 11:24:35 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:24:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:24:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:24:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:24:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:24:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930275.21679 Aug 27 11:24:43 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Aug 27 11:24:43 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Aug 27 11:24:43 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (0): c: 0, oc: 1, rc: 8 Aug 27 11:24:43 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Aug 27 11:24:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 2 seconds Aug 27 11:24:44 fir-md1-s1 kernel: LNetError: 49470:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.18.35@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:24:44 fir-md1-s1 kernel: LNetError: 49470:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 9 previous similar messages Aug 27 11:24:44 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 40 previous similar messages Aug 27 11:24:44 fir-md1-s1 kernel: LNetError: 55551:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.23.35@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:24:45 fir-md1-s1 kernel: LNetError: 55551:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 43 previous similar messages Aug 27 11:24:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:24:45 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Aug 27 11:24:45 fir-md1-s1 kernel: LustreError: 20189:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f15bee6cc00 Aug 27 11:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a221bdf6-ae93-f8c1-7e76-63b55966076b (at 10.8.7.2@o2ib6) reconnecting Aug 27 11:24:46 fir-md1-s1 kernel: Lustre: Skipped 26917 previous similar messages Aug 27 11:24:55 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2159498000 x1641928343789216/t0(0) o101->af04ed26-0e4b-db45-3414-20245014a46d@10.8.27.34@o2ib6:0/0 lens 1784/3288 e 0 to 0 dl 1566930300 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 11:24:55 fir-md1-s1 kernel: Lustre: 22284:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2693569 previous similar messages Aug 27 11:25:02 fir-md1-s1 kernel: Lustre: 21305:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f2974f9a700 x1642476560749008/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:27/0 lens 336/0 e 0 to 0 dl 1566930297 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:25:02 fir-md1-s1 kernel: Lustre: 21305:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 999872 previous similar messages Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:469s); client may timeout. req@ffff8f110b3e2400 x1636659587822528/t0(0) o103->fir-MDT0000-lwp-OST0018_UUID@10.0.10.105@o2ib7:16/0 lens 328/0 e 0 to 0 dl 1566929836 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: 30998:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1531902 previous similar messages Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: 48116:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=5 reqQ=21128 recA=0, svcEst=20, delay=0 Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: 48116:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 28026 previous similar messages Aug 27 11:25:05 fir-md1-s1 kernel: Lustre: Skipped 28027 previous similar messages Aug 27 11:25:16 fir-md1-s1 kernel: LustreError: 31015:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.25.23@o2ib6: deadline 30:1179s ago req@ffff8f2a03df9200 x1642684107840560/t0(0) o103->@:7/0 lens 328/0 e 0 to 0 dl 1566929137 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 11:25:16 fir-md1-s1 kernel: LustreError: 31015:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 43058 previous similar messages Aug 27 11:25:20 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566930313/real 1566930313] req@ffff8f12a2e8ec00 x1636782181251216/t0(0) o13->fir-OST0005-osc-MDT0000@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1566930320 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Aug 27 11:25:21 fir-md1-s1 kernel: Lustre: 20205:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6951 previous similar messages Aug 27 11:25:34 fir-md1-s1 kernel: LustreError: 107732:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1d38810f00 x1636782181259520/t0(0) o105->fir-MDT0002@10.9.103.14@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:25:34 fir-md1-s1 kernel: LustreError: 107732:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 15 previous similar messages Aug 27 11:25:59 fir-md1-s1 kernel: LustreError: 25550:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.104.28@o2ib4 arrived at 1566930332 with bad export cookie 6746083064280769920 Aug 27 11:25:59 fir-md1-s1 kernel: LustreError: 25550:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 499 previous similar messages Aug 27 11:26:00 fir-md1-s1 kernel: LustreError: 22893:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.107.10@o2ib4 arrived at 1566930331 with bad export cookie 6746083168837623377 Aug 27 11:26:00 fir-md1-s1 kernel: LustreError: 22893:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 81 previous similar messages Aug 27 11:26:15 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.25@o2ib6, removing former export from same NID Aug 27 11:26:15 fir-md1-s1 kernel: Lustre: Skipped 725 previous similar messages Aug 27 11:26:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Aug 27 11:26:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Aug 27 11:26:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (4): c: 0, oc: 0, rc: 8 Aug 27 11:26:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Aug 27 11:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:26:53 fir-md1-s1 kernel: Lustre: Skipped 76 previous similar messages Aug 27 11:27:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 02dfd968-e7b1-52cc-0db8-aa0d10c0832c (at 10.9.102.19@o2ib4), client will retry: rc = -110 Aug 27 11:27:05 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Aug 27 11:27:30 fir-md1-s1 kernel: LustreError: 23672:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566930359, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f14a57ed100/0x5d9ee6e658848a2e lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e658848a74 expref: -99 pid: 23672 timeout: 0 lvb_type: 0 Aug 27 11:27:30 fir-md1-s1 kernel: LustreError: 23672:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Aug 27 11:28:02 fir-md1-s1 kernel: LustreError: 23558:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566930392, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f22a6000000/0x5d9ee6e65890693f lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 55 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23558 timeout: 0 lvb_type: 0 Aug 27 11:28:02 fir-md1-s1 kernel: LustreError: 23558:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 153 previous similar messages Aug 27 11:28:12 fir-md1-s1 kernel: LustreError: 22281:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f28da258800 ns: mdt-fir-MDT0000_UUID lock: ffff8f16c6fb5e80/0x5d9ee6e65891de84 lrc: 3/0,0 mode: PW/PW res: [0x20002990b:0x2505:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x50200400000020 nid: 10.9.102.41@o2ib4 remote: 0x26e7a145189a1c0d expref: 21 pid: 22281 timeout: 0 lvb_type: 0 Aug 27 11:28:12 fir-md1-s1 kernel: LustreError: 21429:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f445c1ec000 ns: mdt-fir-MDT0002_UUID lock: ffff8f1995e89d40/0x5d9ee6e658942862 lrc: 3/0,0 mode: PW/PW res: [0x2c002cd49:0x8:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.101.56@o2ib4 remote: 0xc26f5e2374568f42 expref: 3 pid: 21429 timeout: 0 lvb_type: 0 Aug 27 11:28:12 fir-md1-s1 kernel: LustreError: 21429:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 86 previous similar messages Aug 27 11:28:12 fir-md1-s1 kernel: LustreError: 22281:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Aug 27 11:29:14 fir-md1-s1 kernel: LNet: Service thread pid 21413 was inactive for 200.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:29:14 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 27 11:29:14 fir-md1-s1 kernel: Pid: 21413, comm: mdt02_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:29:14 fir-md1-s1 kernel: Call Trace: Aug 27 11:29:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:29:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:29:14 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 11:29:14 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:29:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:29:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:29:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:29:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:29:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:29:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:29:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:29:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:29:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:29:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930554.21413 Aug 27 11:29:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2e2dced000, cur 1566930559 expire 1566930409 last 1566930332 Aug 27 11:29:19 fir-md1-s1 kernel: Pid: 21681, comm: mdt03_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:29:19 fir-md1-s1 kernel: Call Trace: Aug 27 11:29:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:29:19 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:29:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:29:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:29:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:29:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:29:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930559.21681 Aug 27 11:29:22 fir-md1-s1 kernel: Pid: 23672, comm: mdt00_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:29:22 fir-md1-s1 kernel: Call Trace: Aug 27 11:29:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:29:22 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:29:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:29:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:29:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:29:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:29:25 fir-md1-s1 kernel: Pid: 23454, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:29:25 fir-md1-s1 kernel: Call Trace: Aug 27 11:29:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:29:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:29:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:29:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:29:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:29:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930565.23454 Aug 27 11:29:37 fir-md1-s1 kernel: Pid: 97645, comm: mdt01_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:29:37 fir-md1-s1 kernel: Call Trace: Aug 27 11:29:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:29:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:29:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:29:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:29:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:29:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930577.97645 Aug 27 11:29:46 fir-md1-s1 kernel: LNet: Service thread pid 21421 was inactive for 202.85s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:29:46 fir-md1-s1 kernel: LNet: Skipped 48 previous similar messages Aug 27 11:29:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930586.21421 Aug 27 11:29:47 fir-md1-s1 kernel: LNet: Service thread pid 23454 completed after 222.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:29:48 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 27 11:29:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930592.23558 Aug 27 11:29:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930595.23732 Aug 27 11:30:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930602.23737 Aug 27 11:30:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930609.23586 Aug 27 11:30:13 fir-md1-s1 kernel: LustreError: 71846:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f26d0a73900 x1642596393425552/t0(0) o37->301240a6-b5f4-7a43-10f6-31377a5dff80@10.9.106.19@o2ib4:5/0 lens 448/440 e 0 to 0 dl 1566930635 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:30:13 fir-md1-s1 kernel: LustreError: 71846:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 96 previous similar messages Aug 27 11:30:23 fir-md1-s1 kernel: LustreError: 25681:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d5b35655e940 vs. last_xid 5d5b35655f99f req@ffff8f0d6ab9b900 x1642341107886400/t0(0) o101->f35471dc-4c42-bd06-27d8-a92f6bb41fe4@10.9.101.56@o2ib4:23/0 lens 376/0 e 0 to 0 dl 1566930653 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:30:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930646.50584 Aug 27 11:30:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 135s: evicting client at 10.9.102.19@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0dd5d61f80/0x5d9ee6e658a55986 lrc: 3/0,0 mode: PR/PR res: [0x2c002cd5d:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.102.19@o2ib4 remote: 0x2acf83c4ed165e68 expref: 47 pid: 50448 timeout: 6045602 lvb_type: 0 Aug 27 11:30:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 67 previous similar messages Aug 27 11:31:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930660.23575 Aug 27 11:31:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930662.10583 Aug 27 11:31:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930664.23735 Aug 27 11:31:26 fir-md1-s1 kernel: LustreError: 49471:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk WRITE req@ffff8f32b158e850 x1642311796340000/t0(0) o4->17fa2f85-b498-6aea-0e9b-b4cd8046edb1@10.9.115.10@o2ib4:25/0 lens 488/448 e 0 to 0 dl 1566930715 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:31:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930689.23603 Aug 27 11:31:44 fir-md1-s1 kernel: LustreError: 71845:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 23+4s req@ffff8f2febb18300 x1642596393445696/t0(0) o37->301240a6-b5f4-7a43-10f6-31377a5dff80@10.9.106.19@o2ib4:10/0 lens 448/440 e 0 to 0 dl 1566930700 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:31:44 fir-md1-s1 kernel: LustreError: 71845:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 2 previous similar messages Aug 27 11:31:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930708.23739 Aug 27 11:32:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930754.10585 Aug 27 11:32:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930765.10588 Aug 27 11:33:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 18de6053-becd-ecdb-63a5-cde170711750 (at 10.9.114.12@o2ib4), client will retry: rc -110 Aug 27 11:33:09 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages Aug 27 11:33:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930789.10333 Aug 27 11:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) in 392 seconds. I think it's dead, and I am evicting it. exp ffff8f06d6c68800, cur 1566930793 expire 1566930643 last 1566930401 Aug 27 11:33:20 fir-md1-s1 kernel: LustreError: 21039:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3452142050 x1631571760447232/t0(0) o3->a2643c51-ed30-6fc6-ba4f-67e217a258b1@10.9.102.5@o2ib4:15/0 lens 488/440 e 0 to 0 dl 1566930825 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:33:20 fir-md1-s1 kernel: LustreError: 21039:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 28 previous similar messages Aug 27 11:33:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930802.23567 Aug 27 11:33:47 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 14s req@ffff8f18e2734200 x1642614112212992/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:33:47 fir-md1-s1 kernel: Lustre: 30993:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3003556 previous similar messages Aug 27 11:34:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930853.20725 Aug 27 11:34:25 fir-md1-s1 kernel: Pid: 97668, comm: mdt01_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:34:25 fir-md1-s1 kernel: Call Trace: Aug 27 11:34:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:34:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:34:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:34:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:34:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:34:30 fir-md1-s1 kernel: Pid: 23576, comm: mdt00_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:34:30 fir-md1-s1 kernel: Call Trace: Aug 27 11:34:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:34:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:34:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:34:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:34:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:34:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930870.23576 Aug 27 11:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to ac80776b-5bd4-cddf-776c-c2f3659b6c51 (at 10.9.101.70@o2ib4) Aug 27 11:34:30 fir-md1-s1 kernel: Lustre: Skipped 27373 previous similar messages Aug 27 11:34:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d8fb76db-f9d5-d4b5-1fe0-ea814c136f26 (at 10.8.18.32@o2ib6) reconnecting Aug 27 11:34:46 fir-md1-s1 kernel: Lustre: Skipped 27044 previous similar messages Aug 27 11:34:55 fir-md1-s1 kernel: Lustre: 20370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (2/2), not sending early reply req@ffff8f1f2cc69b00 x1641722100710272/t0(0) o103->fir-MDT0000-lwp-MDT0003_UUID@10.0.10.52@o2ib7:27/0 lens 328/0 e 0 to 0 dl 1566930897 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:34:55 fir-md1-s1 kernel: Lustre: 20370:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3577392 previous similar messages Aug 27 11:35:03 fir-md1-s1 kernel: Lustre: 25028:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3e04d79e00 x1642613897534512/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:1/0 lens 328/0 e 0 to 0 dl 1566930901 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:35:03 fir-md1-s1 kernel: Lustre: 25028:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 855076 previous similar messages Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: 20371:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1352s); client may timeout. req@ffff8f2c40274500 x1642613912534944/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:3/0 lens 328/0 e 0 to 0 dl 1566929553 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: 20371:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3650197 previous similar messages Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: Skipped 21132 previous similar messages Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: 27441:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=2 reqQ=452031 recA=0, svcEst=20, delay=0 Aug 27 11:35:06 fir-md1-s1 kernel: Lustre: 27441:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 21138 previous similar messages Aug 27 11:35:11 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:35:11 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 11:35:14 fir-md1-s1 kernel: Pid: 23737, comm: mdt03_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:35:14 fir-md1-s1 kernel: Call Trace: Aug 27 11:35:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:35:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:35:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:35:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:35:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:35:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930914.23737 Aug 27 11:35:14 fir-md1-s1 kernel: LustreError: 10197:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbe4c09df7d0 vs. last_xid 5cbe4c09e1f6f req@ffff8f0f60aa4800 x1631558228113360/t0(0) o101->603ef852-66df-b745-900b-b12995ddbb59@10.9.104.51@o2ib4:14/0 lens 376/0 e 0 to 0 dl 1566930944 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:35:16 fir-md1-s1 kernel: LustreError: 20700:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.52@o2ib4: deadline 30:984s ago req@ffff8f193a2db000 x1642590910725424/t0(0) o103->@:22/0 lens 328/0 e 0 to 0 dl 1566929932 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:35:16 fir-md1-s1 kernel: LustreError: 20700:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 76095 previous similar messages Aug 27 11:35:21 fir-md1-s1 kernel: Lustre: 20225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566930900/real 1566930900] req@ffff8f16bb44c800 x1636782180542768/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566930921 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:35:21 fir-md1-s1 kernel: Lustre: 20225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4210 previous similar messages Aug 27 11:35:25 fir-md1-s1 kernel: Pid: 23729, comm: mdt03_105 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:35:25 fir-md1-s1 kernel: Call Trace: Aug 27 11:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:35:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:35:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:35:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:35:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:35:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930925.23729 Aug 27 11:35:26 fir-md1-s1 kernel: Pid: 23614, comm: mdt03_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:35:27 fir-md1-s1 kernel: Call Trace: Aug 27 11:35:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:35:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:35:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:35:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:35:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:35:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930927.22005 Aug 27 11:35:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930938.50448 Aug 27 11:35:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 236fc160-cd51-8815-c2bd-b00675450148 (at 10.9.109.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f14d8cac000, cur 1566930943 expire 1566930793 last 1566930716 Aug 27 11:36:14 fir-md1-s1 kernel: LustreError: 86542:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0b64dd0f00 x1636782181487904/t0(0) o105->fir-MDT0002@10.8.27.20@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:36:14 fir-md1-s1 kernel: LustreError: 86542:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 18 previous similar messages Aug 27 11:36:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 11:36:21 fir-md1-s1 kernel: Lustre: Skipped 172 previous similar messages Aug 27 11:36:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566930992.50582 Aug 27 11:36:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931013.10145 Aug 27 11:36:56 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:36:56 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Aug 27 11:37:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 154 seconds. I think it's dead, and I am evicting it. exp ffff8f06410fb400, cur 1566931020 expire 1566930870 last 1566930866 Aug 27 11:37:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931020.23574 Aug 27 11:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 3ad7febb-b12e-83e9-5d00-643d11a63aab (at 10.9.103.20@o2ib4), client will retry: rc = -110 Aug 27 11:37:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 27 11:37:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931041.23701 Aug 27 11:37:36 fir-md1-s1 kernel: LustreError: 10146:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566930966, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f33df91f080/0x5d9ee6e658ea6c51 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 11 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e658ea6d46 expref: -99 pid: 10146 timeout: 0 lvb_type: 0 Aug 27 11:37:36 fir-md1-s1 kernel: LustreError: 10146:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 27 11:37:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931062.21429 Aug 27 11:38:08 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566930997, 91s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f353daa6e40/0x5d9ee6e658ed465e lrc: 3/0,1 mode: --/PW res: [0x200029f0d:0x1e710:0x0].0x0 bits 0x40/0x0 rrc: 81 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 97660 timeout: 0 lvb_type: 0 Aug 27 11:38:08 fir-md1-s1 kernel: LustreError: 97660:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 139 previous similar messages Aug 27 11:38:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931108.23605 Aug 27 11:38:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931127.21430 Aug 27 11:39:27 fir-md1-s1 kernel: LNet: Service thread pid 10146 was inactive for 200.89s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:39:27 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 11:39:27 fir-md1-s1 kernel: Pid: 10146, comm: mdt02_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:39:27 fir-md1-s1 kernel: Call Trace: Aug 27 11:39:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:39:27 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:39:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:39:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:39:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:39:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:39:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931167.10146 Aug 27 11:39:34 fir-md1-s1 kernel: Pid: 27320, comm: mdt00_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:39:34 fir-md1-s1 kernel: Call Trace: Aug 27 11:39:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:39:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:39:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:39:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:39:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:39:46 fir-md1-s1 kernel: Pid: 10584, comm: mdt03_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:39:46 fir-md1-s1 kernel: Call Trace: Aug 27 11:39:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:39:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:39:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:39:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:39:46 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:39:46 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:39:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:39:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:39:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:39:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:39:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:39:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:39:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:39:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:39:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:39:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:39:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931187.10584 Aug 27 11:39:49 fir-md1-s1 kernel: LustreError: 23576:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f3e1e695800 ns: mdt-fir-MDT0000_UUID lock: ffff8f0ffdcbf500/0x5d9ee6e658bed1c0 lrc: 3/0,0 mode: PW/PW res: [0x200029f0d:0x1e710:0x0].0x0 bits 0x40/0x0 rrc: 79 type: IBT flags: 0x50200400000020 nid: 10.9.101.3@o2ib4 remote: 0x75cb9a0567510c78 expref: 5 pid: 23576 timeout: 0 lvb_type: 0 Aug 27 11:39:49 fir-md1-s1 kernel: LustreError: 23576:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 84 previous similar messages Aug 27 11:39:50 fir-md1-s1 kernel: LNet: Service thread pid 23576 completed after 520.90s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:39:50 fir-md1-s1 kernel: LNet: Skipped 40 previous similar messages Aug 27 11:39:58 fir-md1-s1 kernel: Pid: 97660, comm: mdt01_099 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:39:58 fir-md1-s1 kernel: Call Trace: Aug 27 11:39:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:39:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:39:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:39:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:39:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:39:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931198.97660 Aug 27 11:40:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.82@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:40:14 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 11:40:14 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (5): c: 5, oc: 0, rc: 8 Aug 27 11:40:15 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 2 seconds Aug 27 11:40:15 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 81 previous similar messages Aug 27 11:40:15 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.24.18@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:40:15 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 70 previous similar messages Aug 27 11:40:15 fir-md1-s1 kernel: LNetError: 55553:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.22.34@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:40:20 fir-md1-s1 kernel: LNetError: 55553:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 20 previous similar messages Aug 27 11:40:20 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.30.9@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:40:20 fir-md1-s1 kernel: LNetError: 20382:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 13 previous similar messages Aug 27 11:40:21 fir-md1-s1 kernel: LustreError: 20958:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f44b88b5400 x1641917137505168/t0(0) o37->211c4417-cd43-2e0b-9f03-69995281dc54@10.9.104.1@o2ib4:3/0 lens 448/440 e 0 to 0 dl 1566931233 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:40:21 fir-md1-s1 kernel: LustreError: 20958:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 98 previous similar messages Aug 27 11:40:22 fir-md1-s1 kernel: Pid: 20725, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:40:22 fir-md1-s1 kernel: Call Trace: Aug 27 11:40:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:40:22 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:40:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:40:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:40:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:40:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931222.20725 Aug 27 11:40:43 fir-md1-s1 kernel: LNet: Service thread pid 23613 was inactive for 200.66s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:40:43 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 27 11:40:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931243.23613 Aug 27 11:40:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931245.23660 Aug 27 11:40:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931247.23760 Aug 27 11:40:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931254.23635 Aug 27 11:40:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 65s: evicting client at 10.9.0.62@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f11fb102f40/0x5d9ee6e658f4b3be lrc: 3/0,0 mode: PR/PR res: [0x2c002c8a0:0x16ad4:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.0.62@o2ib4 remote: 0x2d1dd34919baaed8 expref: 1850 pid: 23701 timeout: 6046297 lvb_type: 0 Aug 27 11:40:58 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 81 previous similar messages Aug 27 11:41:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931263.23632 Aug 27 11:41:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931284.24576 Aug 27 11:41:25 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds Aug 27 11:41:25 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 31 previous similar messages Aug 27 11:41:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931286.22280 Aug 27 11:41:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:41:46 fir-md1-s1 kernel: LustreError: 55539:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 23+4s req@ffff8f2714227450 x1638905708470496/t0(0) o256->43878516-5cef-416d-7ffc-dc295c573105@10.9.113.15@o2ib4:12/0 lens 304/240 e 0 to 0 dl 1566931302 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:41:46 fir-md1-s1 kernel: LustreError: 55539:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Aug 27 11:42:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931344.23580 Aug 27 11:42:39 fir-md1-s1 kernel: LustreError: 71818:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2831fe9200 x1638785395545424/t0(0) o37->1da41a6c-1ec7-44aa-8786-ef5711548b55@10.9.116.2@o2ib4:8/0 lens 448/440 e 0 to 0 dl 1566931388 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:42:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931374.21322 Aug 27 11:43:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with f0a8fbb7-06c4-ed16-a94f-6cea310ceb29 (at 10.8.0.82@o2ib6), client will retry: rc -110 Aug 27 11:43:14 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Aug 27 11:43:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931400.97655 Aug 27 11:43:47 fir-md1-s1 kernel: Lustre: 48116:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 74s req@ffff8f2153b22d00 x1642614178047632/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:43:49 fir-md1-s1 kernel: Lustre: 48116:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2616685 previous similar messages Aug 27 11:43:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931435.10144 Aug 27 11:44:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931440.21677 Aug 27 11:44:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931459.21371 Aug 27 11:44:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931462.20734 Aug 27 11:44:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931465.10364 Aug 27 11:44:29 fir-md1-s1 kernel: Pid: 24579, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:44:29 fir-md1-s1 kernel: Call Trace: Aug 27 11:44:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:44:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:44:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:44:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:44:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:44:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:44:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:44:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:44:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:44:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:44:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931470.24579 Aug 27 11:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a2ca5419-17ca-256b-703c-5b96f1145d62 (at 10.0.10.103@o2ib7) Aug 27 11:44:31 fir-md1-s1 kernel: Lustre: Skipped 29155 previous similar messages Aug 27 11:44:39 fir-md1-s1 kernel: Pid: 10147, comm: mdt03_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:44:39 fir-md1-s1 kernel: Call Trace: Aug 27 11:44:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:44:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:44:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:44:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:44:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:44:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:44:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:44:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:44:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:44:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931478.10147 Aug 27 11:44:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Aug 27 11:44:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 5 previous similar messages Aug 27 11:44:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.210@o2ib7 (6): c: 7, oc: 0, rc: 7 Aug 27 11:44:46 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 5 previous similar messages Aug 27 11:44:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST000e_UUID (at 10.0.10.103@o2ib7) reconnecting Aug 27 11:44:48 fir-md1-s1 kernel: Lustre: Skipped 28238 previous similar messages Aug 27 11:44:51 fir-md1-s1 kernel: Pid: 10332, comm: mdt03_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:44:51 fir-md1-s1 kernel: Call Trace: Aug 27 11:44:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:44:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:44:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:44:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:44:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:44:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:44:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:44:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:44:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:44:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931491.10332 Aug 27 11:44:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 0 seconds Aug 27 11:44:52 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 1 previous similar message Aug 27 11:44:56 fir-md1-s1 kernel: Pid: 23735, comm: mdt03_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:44:56 fir-md1-s1 kernel: Call Trace: Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:44:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:44:56 fir-md1-s1 kernel: Pid: 20731, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:44:56 fir-md1-s1 kernel: Call Trace: Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:44:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:44:56 fir-md1-s1 kernel: Lustre: 20372:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2b439b1b00 x1642826875103616/t0(0) o103->4006b48d-2848-e7d5-0c1a-60f041aa998b@10.8.28.9@o2ib6:1/0 lens 328/0 e 0 to 0 dl 1566931501 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:44:56 fir-md1-s1 kernel: Lustre: 20372:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1859611 previous similar messages Aug 27 11:45:03 fir-md1-s1 kernel: Lustre: 46812:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1cbe7d3000 x1638101793809776/t0(0) o103->f0a8fbb7-06c4-ed16-a94f-6cea310ceb29@10.8.0.82@o2ib6:2/0 lens 328/0 e 0 to 0 dl 1566931502 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:45:03 fir-md1-s1 kernel: Lustre: 46812:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 957297 previous similar messages Aug 27 11:45:05 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:763s); client may timeout. req@ffff8f1c9b9a4800 x1636661861080192/t0(0) o103->fir-MDT0000-lwp-OST002a_UUID@10.0.10.107@o2ib7:22/0 lens 328/0 e 0 to 0 dl 1566930742 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:45:05 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 207669 previous similar messages Aug 27 11:45:06 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:45:06 fir-md1-s1 kernel: Lustre: 22894:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=4 reqQ=224004 recA=0, svcEst=20, delay=0 Aug 27 11:45:06 fir-md1-s1 kernel: Lustre: 22894:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 30709 previous similar messages Aug 27 11:45:06 fir-md1-s1 kernel: Lustre: Skipped 30709 previous similar messages Aug 27 11:45:20 fir-md1-s1 kernel: LustreError: 25086:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.34@o2ib6: deadline 30:1s ago req@ffff8f275b616900 x1641928344018816/t0(0) o102->af04ed26-0e4b-db45-3414-20245014a46d@10.8.27.34@o2ib6:19/0 lens 328/0 e 0 to 0 dl 1566931519 ref 1 fl Interpret:H/2/ffffffff rc 0/-1 Aug 27 11:45:20 fir-md1-s1 kernel: LustreError: 25086:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1071 previous similar messages Aug 27 11:45:34 fir-md1-s1 kernel: Lustre: 20209:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566931502/real 1566931502] req@ffff8f0b380a4b00 x1636782181509504/t0(0) o103->fir-MDT0000-osp-MDT0002@0@lo:17/18 lens 328/224 e 0 to 1 dl 1566931533 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:45:34 fir-md1-s1 kernel: Lustre: 20209:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4984 previous similar messages Aug 27 11:45:40 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Aug 27 11:45:40 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message Aug 27 11:45:40 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (5): c: 0, oc: 0, rc: 8 Aug 27 11:45:40 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message Aug 27 11:45:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931557.22285 Aug 27 11:45:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Aug 27 11:45:58 fir-md1-s1 kernel: LNetError: 20188:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Aug 27 11:46:19 fir-md1-s1 kernel: LustreError: 71884:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f445c0c2c50 x1633897170242640/t0(0) o37->99661aab-9554-9a66-a9ba-0efac2d490ec@10.9.101.5@o2ib4:5/0 lens 448/440 e 0 to 0 dl 1566931595 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:46:19 fir-md1-s1 kernel: LustreError: 71884:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 8 previous similar messages Aug 27 11:46:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 11:46:21 fir-md1-s1 kernel: Lustre: Skipped 946 previous similar messages Aug 27 11:46:48 fir-md1-s1 kernel: LustreError: 107739:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f199c049e00 x1636782181728560/t0(0) o105->fir-MDT0000@10.9.107.52@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:46:48 fir-md1-s1 kernel: LustreError: 107739:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 8 previous similar messages Aug 27 11:47:09 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:47:09 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Aug 27 11:48:11 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566931600, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1d66ef8b40/0x5d9ee6e65926661b lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 14 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e659266ec6 expref: -99 pid: 20555 timeout: 0 lvb_type: 0 Aug 27 11:48:11 fir-md1-s1 kernel: LustreError: 20555:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 27 11:48:16 fir-md1-s1 kernel: LustreError: 23679:0:(ldlm_lockd.c:1285:ldlm_handle_enqueue0()) ### lock on disconnected export ffff8f2d855ebc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e462518c0/0x5d9ee6e659378c17 lrc: 2/0,0 mode: --/CR res: [0x2c002cc27:0x67da:0x0].0x0 bits 0x0/0x0 rrc: 7 type: IBT flags: 0x40000000000000 nid: local remote: 0xac353b35578b316a expref: -99 pid: 23679 timeout: 0 lvb_type: 0 Aug 27 11:48:19 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566931608, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f34251ba400/0x5d9ee6e6592726b0 lrc: 3/0,1 mode: --/CW res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x2/0x0 rrc: 20 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21414 timeout: 0 lvb_type: 0 Aug 27 11:48:19 fir-md1-s1 kernel: LustreError: 21414:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 187 previous similar messages Aug 27 11:48:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f10c81cd000, cur 1566931706 expire 1566931556 last 1566931479 Aug 27 11:48:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 681d472f-4fd3-a063-620e-44b94a766d00 (at 10.9.103.16@o2ib4), client will retry: rc = -110 Aug 27 11:48:30 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 27 11:49:59 fir-md1-s1 kernel: LustreError: 23692:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f23803abc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f349d938240/0x5d9ee6e6593aac55 lrc: 3/0,0 mode: EX/EX res: [0x2c002cd82:0x3:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x50000000000000 nid: 10.8.8.31@o2ib6 remote: 0xf691076e572e80ef expref: 10 pid: 23692 timeout: 0 lvb_type: 3 Aug 27 11:49:59 fir-md1-s1 kernel: LustreError: 23692:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 90 previous similar messages Aug 27 11:50:01 fir-md1-s1 kernel: LNet: Service thread pid 20555 was inactive for 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 11:50:01 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 11:50:01 fir-md1-s1 kernel: Pid: 20555, comm: mdt01_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:50:01 fir-md1-s1 kernel: Call Trace: Aug 27 11:50:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 11:50:01 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:50:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:50:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:50:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:50:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:50:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931801.20555 Aug 27 11:50:25 fir-md1-s1 kernel: LustreError: 21894:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1637bc0000 x1641917137695696/t0(0) o37->211c4417-cd43-2e0b-9f03-69995281dc54@10.9.104.1@o2ib4:4/0 lens 448/440 e 0 to 0 dl 1566931834 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 11:50:25 fir-md1-s1 kernel: LustreError: 21894:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 97 previous similar messages Aug 27 11:50:34 fir-md1-s1 kernel: Pid: 23753, comm: mdt02_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:50:34 fir-md1-s1 kernel: Call Trace: Aug 27 11:50:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:50:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:50:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:50:35 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:50:35 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:50:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:50:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:50:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:50:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:50:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:50:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931835.23753 Aug 27 11:50:38 fir-md1-s1 kernel: Pid: 25681, comm: mdt00_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:50:38 fir-md1-s1 kernel: Call Trace: Aug 27 11:50:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:50:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:50:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:50:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:50:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:50:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:50:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:50:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:50:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:50:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:50:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:50:39 fir-md1-s1 kernel: Pid: 21371, comm: mdt02_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:50:39 fir-md1-s1 kernel: Call Trace: Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:50:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:50:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:50:39 fir-md1-s1 kernel: Pid: 50580, comm: mdt02_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:50:39 fir-md1-s1 kernel: Call Trace: Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:50:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:50:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:50:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:50:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931841.97651 Aug 27 11:50:48 fir-md1-s1 kernel: LNet: Service thread pid 10308 completed after 213.83s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 11:50:48 fir-md1-s1 kernel: LNet: Skipped 42 previous similar messages Aug 27 11:50:49 fir-md1-s1 kernel: LNet: Service thread pid 21671 was inactive for 200.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 11:50:49 fir-md1-s1 kernel: LNet: Skipped 46 previous similar messages Aug 27 11:50:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931848.21671 Aug 27 11:50:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931852.23655 Aug 27 11:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b (at 10.9.109.57@o2ib4) in 1154 seconds. I think it's dead, and I am evicting it. exp ffff8f4537982c00, cur 1566931870 expire 1566931720 last 1566930716 Aug 27 11:51:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.103.23@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f0b7819e540/0x5d9ee6e6593c165d lrc: 3/0,0 mode: PR/PR res: [0x2c002cd2b:0x17:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.103.23@o2ib4 remote: 0xc754b064adfbcab8 expref: 57 pid: 10307 timeout: 6046954 lvb_type: 0 Aug 27 11:51:35 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 71 previous similar messages Aug 27 11:51:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931899.23627 Aug 27 11:52:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.110.39@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:52:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931955.23721 Aug 27 11:52:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566931957.21436 Aug 27 11:52:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 11:52:51 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (4): c: 7, oc: 2, rc: 7 Aug 27 11:52:51 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f23432dac00 Aug 27 11:52:51 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 5 seconds Aug 27 11:52:51 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 37 previous similar messages Aug 27 11:53:26 fir-md1-s1 kernel: LustreError: 71819:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+13s req@ffff8f2b52264e00 x1641916375407856/t0(0) o37->fc4076f1-0bfd-2b0a-8710-4b2a9ef8582d@10.9.104.3@o2ib4:13/0 lens 448/440 e 0 to 0 dl 1566931993 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:53:26 fir-md1-s1 kernel: LustreError: 71819:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Aug 27 11:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 6bb1b23c-28f8-153d-8cc1-2ff0115f9167 (at 10.9.106.58@o2ib4), client will retry: rc -107 Aug 27 11:53:27 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Aug 27 11:53:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b (at 10.9.109.57@o2ib4) in 1294 seconds. I think it's dead, and I am evicting it. exp ffff8f4516ed5c00, cur 1566932016 expire 1566931866 last 1566930722 Aug 27 11:53:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 11:53:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932016.20465 Aug 27 11:53:47 fir-md1-s1 kernel: Lustre: 30994:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 18s req@ffff8f0b292de300 x1642614204083552/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 11:53:47 fir-md1-s1 kernel: Lustre: 30994:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3458588 previous similar messages Aug 27 11:54:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932048.23598 Aug 27 11:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 320d2212-24ba-6a9d-b9a0-9c33ec72e105 (at 10.9.107.64@o2ib4) Aug 27 11:54:30 fir-md1-s1 kernel: Lustre: Skipped 30590 previous similar messages Aug 27 11:54:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932070.23730 Aug 27 11:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d80129c2-0e3c-4dab-61df-4121beba5d58 (at 10.8.27.10@o2ib6) reconnecting Aug 27 11:54:47 fir-md1-s1 kernel: Lustre: Skipped 29623 previous similar messages Aug 27 11:54:56 fir-md1-s1 kernel: Lustre: 21285:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (0/-10), not sending early reply req@ffff8f07203e2400 x1636659203694144/t0(0) o103->fir-MDT0000-lwp-OST001b_UUID@10.0.10.106@o2ib7:26/0 lens 328/0 e 0 to 0 dl 1566932096 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:54:56 fir-md1-s1 kernel: Lustre: 21285:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2112145 previous similar messages Aug 27 11:55:03 fir-md1-s1 kernel: Lustre: 25076:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-49s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f1a39c08c00 x1642614194808816/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:14/0 lens 328/0 e 0 to 0 dl 1566932054 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 11:55:03 fir-md1-s1 kernel: Lustre: 25076:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1593458 previous similar messages Aug 27 11:55:04 fir-md1-s1 kernel: Pid: 23655, comm: mdt03_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:55:04 fir-md1-s1 kernel: Call Trace: Aug 27 11:55:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:55:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:55:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:55:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:55:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:55:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:55:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:55:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:55:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:55:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:55:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:55:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932104.23655 Aug 27 11:55:05 fir-md1-s1 kernel: Lustre: 22136:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2149s); client may timeout. req@ffff8f1df9f52a00 x1631570647168224/t0(0) o103->36978ce1-8003-18d4-3f72-d4a06285b8af@10.9.101.61@o2ib4:16/0 lens 328/0 e 0 to 0 dl 1566929956 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 11:55:05 fir-md1-s1 kernel: Lustre: 22136:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 752024 previous similar messages Aug 27 11:55:06 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 11:55:07 fir-md1-s1 kernel: Lustre: 25074:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=117 reqQ=343760 recA=0, svcEst=20, delay=241 Aug 27 11:55:07 fir-md1-s1 kernel: Lustre: 25074:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 49331 previous similar messages Aug 27 11:55:07 fir-md1-s1 kernel: Lustre: Skipped 49352 previous similar messages Aug 27 11:55:07 fir-md1-s1 kernel: Pid: 23699, comm: mdt03_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:55:07 fir-md1-s1 kernel: Call Trace: Aug 27 11:55:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:55:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:55:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:55:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:55:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:55:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:55:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:55:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:55:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:55:09 fir-md1-s1 kernel: Pid: 23658, comm: mdt03_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:55:09 fir-md1-s1 kernel: Call Trace: Aug 27 11:55:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:55:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:55:09 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:55:09 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:55:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:55:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:55:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:55:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:55:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:55:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932109.23658 Aug 27 11:55:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:55:15 fir-md1-s1 kernel: Pid: 20952, comm: mdt03_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:55:15 fir-md1-s1 kernel: Call Trace: Aug 27 11:55:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:55:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:55:15 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 11:55:15 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 11:55:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 11:55:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:55:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:55:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:55:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:55:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932115.20952 Aug 27 11:55:27 fir-md1-s1 kernel: LustreError: 20372:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.116.6@o2ib4: deadline 30:2559s ago req@ffff8f21bb9b4500 x1642338580032176/t0(0) o103->@:18/0 lens 328/0 e 0 to 0 dl 1566929568 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 11:55:27 fir-md1-s1 kernel: LustreError: 20372:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 288 previous similar messages Aug 27 11:55:38 fir-md1-s1 kernel: Lustre: 20219:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566932117/real 1566932117] req@ffff8f24d513ce00 x1636782180543632/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566932138 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 11:55:38 fir-md1-s1 kernel: Lustre: 20219:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3724 previous similar messages Aug 27 11:55:44 fir-md1-s1 kernel: Pid: 23751, comm: mdt02_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 11:55:44 fir-md1-s1 kernel: Call Trace: Aug 27 11:55:44 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x890 [ptlrpc] Aug 27 11:55:44 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 11:55:44 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 11:55:44 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 11:55:44 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 11:55:44 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 11:55:44 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 11:55:44 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 11:55:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932144.23751 Aug 27 11:56:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932174.10196 Aug 27 11:56:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Aug 27 11:56:19 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.210@o2ib7 (0): c: 0, oc: 0, rc: 7 Aug 27 11:56:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.102.45@o2ib4, removing former export from same NID Aug 27 11:56:30 fir-md1-s1 kernel: Lustre: Skipped 421 previous similar messages Aug 27 11:56:31 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f30f4f37600 Aug 27 11:56:31 fir-md1-s1 kernel: LNetError: 55538:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.24.17@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:56:31 fir-md1-s1 kernel: LNetError: 55538:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 391 previous similar messages Aug 27 11:56:31 fir-md1-s1 kernel: LustreError: 20190:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f30f4e1ee00 Aug 27 11:56:32 fir-md1-s1 kernel: LNetError: 21265:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.26.14@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:56:32 fir-md1-s1 kernel: LNetError: 21265:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 14 previous similar messages Aug 27 11:56:33 fir-md1-s1 kernel: LNetError: 23622:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.30.26@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:56:33 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 11:56:33 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Aug 27 11:56:33 fir-md1-s1 kernel: LNetError: 23622:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 38 previous similar messages Aug 27 11:56:35 fir-md1-s1 kernel: LNetError: 23601:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.18.17@o2ib6 from 10.0.10.51@o2ib7 Aug 27 11:56:35 fir-md1-s1 kernel: LNetError: 23601:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 29 previous similar messages Aug 27 11:56:38 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.107@o2ib7: accepting Aug 27 11:56:39 fir-md1-s1 kernel: LustreError: 23607:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.103.39@o2ib4) failed to reply to blocking AST (req@ffff8f2c496da700 x1636782181950112 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8f1f4d41cc80/0x5d9ee6e65966529a lrc: 4/0,0 mode: PR/PR res: [0x200029972:0x225c:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.103.39@o2ib4 remote: 0x3063042c5bb3dc6 expref: 222 pid: 24582 timeout: 6047280 lvb_type: 0 Aug 27 11:56:39 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.103.39@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 27 11:56:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932200.23644 Aug 27 11:56:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932204.97671 Aug 27 11:56:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932209.23583 Aug 27 11:56:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:56:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932214.24577 Aug 27 11:57:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 11:57:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 27 11:57:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932224.20459 Aug 27 11:57:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932235.23579 Aug 27 11:57:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f065b13ac00, cur 1566932236 expire 1566932086 last 1566932009 Aug 27 11:57:22 fir-md1-s1 kernel: LustreError: 23687:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f1712ebdd00 x1636782181990496/t0(0) o104->fir-MDT0000@10.9.107.33@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 11:57:22 fir-md1-s1 kernel: LustreError: 23687:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 12 previous similar messages Aug 27 11:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 11:57:37 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Aug 27 11:57:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932272.20464 Aug 27 11:57:58 fir-md1-s1 kernel: LustreError: 46588:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1a1aaf3850 x1642973051932832/t0(0) o3->6282e924-823c-ee43-6de9-1b6a734cef6f@10.8.0.67@o2ib6:27/0 lens 488/440 e 0 to 0 dl 1566932307 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 11:57:58 fir-md1-s1 kernel: LustreError: 46588:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 30 previous similar messages Aug 27 11:58:33 fir-md1-s1 kernel: LustreError: 23600:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566932222, 91s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f0859afd340/0x5d9ee6e65967e1d9 lrc: 3/1,0 mode: --/PR res: [0x20002a1e1:0xea:0x0].0x0 bits 0x13/0x0 rrc: 41 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23600 timeout: 0 lvb_type: 0 Aug 27 11:58:33 fir-md1-s1 kernel: LustreError: 23600:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 527 previous similar messages Aug 27 11:58:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 13 seconds Aug 27 11:58:41 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 121 previous similar messages Aug 27 11:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 7372bd8e-4f77-9af0-e0f4-c1915e510b36 (at 10.9.103.22@o2ib4), client will retry: rc = -110 Aug 27 11:59:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 27 12:00:03 fir-md1-s1 kernel: LustreError: 50581:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5d1f996785390 vs. last_xid 5d1f99678888f req@ffff8f39858d2a00 x1638244785083280/t0(0) o101->8ec1acae-5541-1224-6330-34435f948ba9@10.9.106.61@o2ib4:3/0 lens 576/0 e 0 to 0 dl 1566932433 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:00:23 fir-md1-s1 kernel: LustreError: 23735:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2520239c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2333764ec0/0x5d9ee6e659725966 lrc: 3/0,0 mode: PR/PR res: [0x2c002c34d:0xc0f:0x0].0x0 bits 0x13/0x0 rrc: 56 type: IBT flags: 0x50200000000000 nid: 10.9.101.51@o2ib4 remote: 0xc6ba6ec9435c8a74 expref: 15490 pid: 23735 timeout: 0 lvb_type: 0 Aug 27 12:00:23 fir-md1-s1 kernel: LustreError: 23735:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 237 previous similar messages Aug 27 12:00:30 fir-md1-s1 kernel: LNet: Service thread pid 97654 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:00:30 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:00:30 fir-md1-s1 kernel: Pid: 97654, comm: mdt01_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:00:30 fir-md1-s1 kernel: Call Trace: Aug 27 12:00:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:00:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:00:30 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:00:30 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:00:30 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:00:30 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:00:30 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:00:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:00:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:00:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:00:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:00:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:00:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:00:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932430.97654 Aug 27 12:00:51 fir-md1-s1 kernel: Pid: 97644, comm: mdt01_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:00:51 fir-md1-s1 kernel: Call Trace: Aug 27 12:00:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:00:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:00:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:00:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:00:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:00:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:00:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:00:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:00:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:00:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932451.97644 Aug 27 12:00:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:00:57 fir-md1-s1 kernel: LustreError: 71859:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f32905dc500 x1642528341554256/t0(0) o37->d1a8de5f-e132-abf7-7e4b-84b2d20d113d@10.8.8.31@o2ib6:24/0 lens 448/440 e 0 to 0 dl 1566932484 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:00:57 fir-md1-s1 kernel: LustreError: 71859:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 33 previous similar messages Aug 27 12:00:58 fir-md1-s1 kernel: Pid: 20724, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:00:58 fir-md1-s1 kernel: Call Trace: Aug 27 12:00:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:00:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:00:58 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:00:58 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:00:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:00:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:00:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:00:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:00:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:00:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:00:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:00:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:00:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:00:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932458.20724 Aug 27 12:01:00 fir-md1-s1 kernel: Pid: 22283, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:01:00 fir-md1-s1 kernel: Call Trace: Aug 27 12:01:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:01:00 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:01:00 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:01:00 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:01:00 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:01:00 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:01:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:01:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:01:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:01:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932460.22283 Aug 27 12:01:01 fir-md1-s1 kernel: Pid: 23663, comm: mdt03_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:01:01 fir-md1-s1 kernel: Call Trace: Aug 27 12:01:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:01:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:01:01 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:01:01 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:01:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:01:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:01:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:01:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:01:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:01:07 fir-md1-s1 kernel: LNet: Service thread pid 23667 was inactive for 200.16s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:01:07 fir-md1-s1 kernel: LNet: Skipped 72 previous similar messages Aug 27 12:01:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932467.23667 Aug 27 12:01:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932480.23620 Aug 27 12:01:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 12:01:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 7 previous similar messages Aug 27 12:01:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (5): c: 2, oc: 0, rc: 8 Aug 27 12:01:23 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 7 previous similar messages Aug 27 12:01:38 fir-md1-s1 kernel: Lustre: 46540:0:(tgt_handler.c:562:tgt_handle_recovery()) @@@ rq_xid 1640779230080832 matches saved xid, expected REPLAY or RESENT flag (0) req@ffff8f3951b7e050 x1640779230080832/t0(0) o4->c74cabd5-45b1-86e5-60f0-8f68b07a88b1@10.9.103.24@o2ib4:6/0 lens 808/0 e 0 to 0 dl 1566932526 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 12:01:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 95s: evicting client at 10.8.0.68@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f26c9f272c0/0x5d9ee6e6597ea0ac lrc: 3/0,0 mode: PR/PR res: [0x2c002cd0f:0x1e:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.0.68@o2ib6 remote: 0x1ffabefaba942d6f expref: 228 pid: 10144 timeout: 6047502 lvb_type: 0 Aug 27 12:01:48 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 218 previous similar messages Aug 27 12:01:48 fir-md1-s1 kernel: LNet: Service thread pid 23759 completed after 2583.96s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 12:01:48 fir-md1-s1 kernel: LNet: Skipped 68 previous similar messages Aug 27 12:03:47 fir-md1-s1 kernel: Lustre: 30991:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 56s req@ffff8f43e5488450 x1642614205951168/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:03:47 fir-md1-s1 kernel: Lustre: 30991:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2544882 previous similar messages Aug 27 12:03:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932628.97652 Aug 27 12:03:50 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f26d1c9b000 Aug 27 12:04:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:04:06 fir-md1-s1 kernel: LustreError: 21539:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 14+2s req@ffff8f2925f47450 x1634183748013632/t0(0) o3->1a643088-ea7a-3acd-f835-98d006253e47@10.8.20.19@o2ib6:3/0 lens 488/440 e 0 to 0 dl 1566932643 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:04:08 fir-md1-s1 kernel: LustreError: 21539:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 6 previous similar messages Aug 27 12:04:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 1a643088-ea7a-3acd-f835-98d006253e47 (at 10.8.20.19@o2ib6), client will retry: rc -110 Aug 27 12:04:08 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Aug 27 12:04:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.27.22@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:04:30 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 27 12:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6b62867e-691f-9446-0cee-c49107cf4650 (at 10.9.107.53@o2ib4) Aug 27 12:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to de956b26-1e75-394a-993b-9ba5090fa4fd (at 10.9.101.62@o2ib4) Aug 27 12:04:31 fir-md1-s1 kernel: Lustre: Skipped 30958 previous similar messages Aug 27 12:04:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 12:04:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932671.23714 Aug 27 12:04:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7b47d238-4d96-3180-1efb-43deab0e7ece (at 10.8.24.19@o2ib6) reconnecting Aug 27 12:04:46 fir-md1-s1 kernel: Lustre: Skipped 28549 previous similar messages Aug 27 12:04:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932686.20723 Aug 27 12:04:56 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f17bef4c200 x1642435536254144/t0(0) o101->e03c9616-54a9-c71a-abb6-e07903704c3a@10.9.110.15@o2ib4:1/0 lens 584/3264 e 0 to 0 dl 1566932701 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 12:04:56 fir-md1-s1 kernel: Lustre: 22007:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1405172 previous similar messages Aug 27 12:05:03 fir-md1-s1 kernel: Lustre: 36768:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-112s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f13d5529c50 x1631597852691616/t0(0) o103->@:11/0 lens 328/0 e 0 to 0 dl 1566932591 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:05:03 fir-md1-s1 kernel: Lustre: 36768:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1334020 previous similar messages Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: 25549:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2292s); client may timeout. req@ffff8f0e03cb9200 x1642476567597792/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:23/0 lens 336/0 e 0 to 0 dl 1566930413 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: 25549:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 248490 previous similar messages Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: 25028:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=22 reqQ=113277 recA=0, svcEst=20, delay=0 Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: 25028:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 37008 previous similar messages Aug 27 12:05:06 fir-md1-s1 kernel: Lustre: Skipped 37002 previous similar messages Aug 27 12:05:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932721.23677 Aug 27 12:05:31 fir-md1-s1 kernel: LustreError: 31012:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.103.30@o2ib4: deadline 30:3148s ago req@ffff8f0ce0faec00 x1631746184347840/t0(0) o103->@:3/0 lens 328/0 e 0 to 0 dl 1566929583 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:05:31 fir-md1-s1 kernel: LustreError: 31012:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 368 previous similar messages Aug 27 12:05:39 fir-md1-s1 kernel: Lustre: 20223:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566932718/real 1566932718] req@ffff8f24d513e000 x1636782180543696/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566932739 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:05:39 fir-md1-s1 kernel: Lustre: 20223:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7613 previous similar messages Aug 27 12:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 12:06:22 fir-md1-s1 kernel: Lustre: Skipped 2216 previous similar messages Aug 27 12:06:46 fir-md1-s1 kernel: Pid: 23741, comm: mdt02_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:06:54 fir-md1-s1 kernel: Call Trace: Aug 27 12:06:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:06:54 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:06:54 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:06:54 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:06:54 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:06:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:06:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:06:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:06:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932806.23741 Aug 27 12:06:54 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [ldlm_cn03_010:23042] Aug 27 12:06:54 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses enclosure dm_multipath ipmi_si pcspkr dm_mod ipmi_devintf k10temp ccp sg i2c_piix4 ipmi_msghandler acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif Aug 27 12:06:54 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core(OE) libahci mlxfw(OE) crct10dif_pclmul devlink crct10dif_common tg3 mlx_compat(OE) drm ptp libata megaraid_sas crc32c_intel drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas Aug 27 12:06:54 fir-md1-s1 kernel: CPU: 7 PID: 23042 Comm: ldlm_cn03_010 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 Aug 27 12:06:54 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 Aug 27 12:06:54 fir-md1-s1 kernel: task: ffff8f153c77e180 ti: ffff8f126da88000 task.ti: ffff8f126da88000 Aug 27 12:06:54 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Aug 27 12:06:54 fir-md1-s1 kernel: RSP: 0018:ffff8f126da8bda8 EFLAGS: 00000246 Aug 27 12:06:54 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff8f126da8bd30 RCX: 0000000000390000 Aug 27 12:06:54 fir-md1-s1 kernel: RDX: ffff8f457f49b780 RSI: 0000000000590001 RDI: ffff8f0f78be3a88 Aug 27 12:06:54 fir-md1-s1 kernel: RBP: ffff8f126da8bda8 R08: ffff8f457f45b780 R09: 0000000000000000 Aug 27 12:06:54 fir-md1-s1 kernel: R10: 00000000004c4b40 R11: 0000000000000000 R12: ffffffffc0dc66c0 Aug 27 12:06:54 fir-md1-s1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000014800000000 Aug 27 12:06:54 fir-md1-s1 kernel: FS: 00007f17a30c1700(0000) GS:ffff8f457f440000(0000) knlGS:0000000000000000 Aug 27 12:06:54 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 27 12:06:54 fir-md1-s1 kernel: CR2: 00000000024fc378 CR3: 000000103c36c000 CR4: 00000000003407e0 Aug 27 12:06:54 fir-md1-s1 kernel: Call Trace: Aug 27 12:06:54 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Aug 27 12:06:54 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Aug 27 12:06:54 fir-md1-s1 kernel: [] ptlrpc_server_hpreq_fini+0x68/0x170 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] ptlrpc_main+0xdc0/0x1fc0 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Aug 27 12:06:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:06:54 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 12:06:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:06:54 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 12:06:54 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 85 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Aug 27 12:07:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 948939df-2ab3-892b-9f0c-3de9a46051b1 (at 10.9.0.1@o2ib4) in 231 seconds. I think it's dead, and I am evicting it. exp ffff8f3995b67c00, cur 1566932814 expire 1566932664 last 1566932583 Aug 27 12:07:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 12:07:00 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Aug 27 12:07:00 fir-md1-s1 kernel: LNetError: 20187:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Aug 27 12:07:12 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.108@o2ib7: accepting Aug 27 12:07:13 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: connected Aug 27 12:07:14 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.103@o2ib7: connected Aug 27 12:07:14 fir-md1-s1 kernel: Pid: 23760, comm: mdt02_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:07:14 fir-md1-s1 kernel: Call Trace: Aug 27 12:07:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:07:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:07:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:07:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:07:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:07:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:07:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:07:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:07:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:07:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932834.23760 Aug 27 12:07:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:07:27 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 27 12:07:31 fir-md1-s1 kernel: Pid: 23684, comm: mdt02_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:07:32 fir-md1-s1 kernel: Call Trace: Aug 27 12:07:32 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:07:32 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:07:32 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:07:32 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:07:32 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:07:32 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:07:32 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:07:32 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:07:32 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:07:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932851.23684 Aug 27 12:07:32 fir-md1-s1 kernel: LustreError: 71867:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2b75857850 x1642528341751664/t0(0) o37->d1a8de5f-e132-abf7-7e4b-84b2d20d113d@10.8.8.31@o2ib6:4/0 lens 448/440 e 0 to 0 dl 1566932824 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:07:48 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:07:48 fir-md1-s1 kernel: Lustre: Skipped 197 previous similar messages Aug 27 12:08:07 fir-md1-s1 kernel: Pid: 20541, comm: mdt00_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:08:07 fir-md1-s1 kernel: Call Trace: Aug 27 12:08:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:08:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:08:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:08:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:08:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:08:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:08:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:08:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:08:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:08:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932887.20541 Aug 27 12:08:11 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1a1aa62050 x1642973055609872/t0(0) o3->6282e924-823c-ee43-6de9-1b6a734cef6f@10.8.0.67@o2ib6:11/0 lens 488/440 e 0 to 0 dl 1566932921 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:08:11 fir-md1-s1 kernel: LustreError: 25630:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 25 previous similar messages Aug 27 12:09:09 fir-md1-s1 kernel: Pid: 97667, comm: mdt01_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:09:09 fir-md1-s1 kernel: LustreError: 50446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566932858, 91s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f17ad2b2ac0/0x5d9ee6e659a16cbf lrc: 3/0,1 mode: --/PW res: [0x2c002cda4:0x1:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 50446 timeout: 0 lvb_type: 0 Aug 27 12:09:09 fir-md1-s1 kernel: LustreError: 50446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 366 previous similar messages Aug 27 12:09:09 fir-md1-s1 kernel: Call Trace: Aug 27 12:09:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x890 [ptlrpc] Aug 27 12:09:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:09:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:09:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:09:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:09:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:09:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:09:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:09:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932949.97667 Aug 27 12:09:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932975.23454 Aug 27 12:09:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566932981.23623 Aug 27 12:09:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:09:59 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 27 12:10:48 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f0ef47ec400 ns: mdt-fir-MDT0000_UUID lock: ffff8f30d63f8fc0/0x5d9ee6e659a6cec3 lrc: 3/0,0 mode: PW/PW res: [0x200029c11:0xec:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.8.29.8@o2ib6 remote: 0xfccaff92bb58a0c2 expref: 766 pid: 21452 timeout: 0 lvb_type: 0 Aug 27 12:10:48 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 139 previous similar messages Aug 27 12:10:53 fir-md1-s1 kernel: LustreError: 50445:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f194d2d3c00 x1636782182399328/t0(0) o104->fir-MDT0000@10.8.29.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 12:10:53 fir-md1-s1 kernel: LustreError: 50445:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Aug 27 12:10:55 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 12:10:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 15 previous similar messages Aug 27 12:10:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.102@o2ib7 (5): c: 0, oc: 0, rc: 8 Aug 27 12:10:59 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 15 previous similar messages Aug 27 12:10:59 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.105@o2ib7: 8 seconds Aug 27 12:10:59 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 254 previous similar messages Aug 27 12:11:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933064.10149 Aug 27 12:11:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c774eb31-dfd7-6338-06de-2e964154b0ae (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f129045b000, cur 1566933067 expire 1566932917 last 1566932840 Aug 27 12:11:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 12:11:13 fir-md1-s1 kernel: LNet: Service thread pid 23747 was inactive for 200.02s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:11:13 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 27 12:11:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933072.23747 Aug 27 12:11:36 fir-md1-s1 kernel: LustreError: 21006:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f34f38a5700 x1634183748249952/t0(0) o37->1a643088-ea7a-3acd-f835-98d006253e47@10.8.20.19@o2ib6:6/0 lens 448/440 e 0 to 0 dl 1566933126 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:11:36 fir-md1-s1 kernel: LustreError: 21006:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 23 previous similar messages Aug 27 12:12:03 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 121s: evicting client at 10.9.101.56@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f26f2f0c140/0x5d9ee6e659adb7af lrc: 3/0,0 mode: EX/EX res: [0x2c002cda2:0x2:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.9.101.56@o2ib4 remote: 0xc26f5e2374569f5d expref: 29 pid: 23710 timeout: 6048090 lvb_type: 3 Aug 27 12:12:03 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 113 previous similar messages Aug 27 12:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c74cabd5-45b1-86e5-60f0-8f68b07a88b1 (at 10.9.103.24@o2ib4), client will retry: rc = -110 Aug 27 12:13:05 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 27 12:13:42 fir-md1-s1 kernel: LustreError: 97649:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566933132, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2403c6bcc0/0x5d9ee6e659c36241 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e659c36264 expref: -99 pid: 97649 timeout: 0 lvb_type: 0 Aug 27 12:13:48 fir-md1-s1 kernel: Lustre: 46811:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 54s req@ffff8f192901b600 x1642614205273504/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:13:48 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 54s req@ffff8f2c0caca050 x1642476564666544/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:13:48 fir-md1-s1 kernel: Lustre: 26626:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2220428 previous similar messages Aug 27 12:13:48 fir-md1-s1 kernel: Lustre: 46811:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 239 previous similar messages Aug 27 12:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with f0a8fbb7-06c4-ed16-a94f-6cea310ceb29 (at 10.8.0.82@o2ib6), client will retry: rc -107 Aug 27 12:14:13 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Aug 27 12:14:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:14:20 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Aug 27 12:14:26 fir-md1-s1 kernel: LNet: Service thread pid 21415 was inactive for 200.13s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:14:26 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:14:26 fir-md1-s1 kernel: Pid: 21415, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:14:26 fir-md1-s1 kernel: Call Trace: Aug 27 12:14:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:14:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:14:26 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:14:26 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:14:26 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:14:26 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:14:26 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:14:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:14:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:14:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:14:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:14:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:14:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:14:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933266.21415 Aug 27 12:14:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 18baa8eb-3796-4c59-4335-f1e0f1008b8c (at 10.9.112.8@o2ib4) Aug 27 12:14:31 fir-md1-s1 kernel: Lustre: Skipped 31849 previous similar messages Aug 27 12:14:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 1d81cd7e-3850-4cfe-7531-522b91b4890c (at 10.9.114.1@o2ib4) reconnecting Aug 27 12:14:46 fir-md1-s1 kernel: Lustre: Skipped 29919 previous similar messages Aug 27 12:14:47 fir-md1-s1 kernel: LNet: Service thread pid 21413 completed after 2934.16s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 12:14:47 fir-md1-s1 kernel: LNet: Skipped 50 previous similar messages Aug 27 12:14:57 fir-md1-s1 kernel: Lustre: 23016:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f128b55a700 x1636659586903232/t0(0) o103->fir-MDT0000-lwp-OST0020_UUID@10.0.10.105@o2ib7:2/0 lens 328/0 e 0 to 0 dl 1566933302 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:14:57 fir-md1-s1 kernel: Lustre: 23016:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1578425 previous similar messages Aug 27 12:15:05 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:2277s); client may timeout. req@ffff8f145579d100 x1642476566626208/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:8/0 lens 336/0 e 0 to 0 dl 1566931028 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:15:05 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4827585 previous similar messages Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: Skipped 30888 previous similar messages Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: 21268:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=4056 reqQ=107623 recA=0, svcEst=20, delay=549 Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: 21268:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 30884 previous similar messages Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: 21765:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f336739b300 x1638896944582336/t0(0) o103->215c0fff-f656-54cb-03f4-a4377c4367c7@10.9.107.21@o2ib4:24/0 lens 328/0 e 0 to 0 dl 1566933324 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:15:25 fir-md1-s1 kernel: Lustre: 21765:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1125945 previous similar messages Aug 27 12:15:30 fir-md1-s1 kernel: LustreError: 24587:0:(ldlm_lockd.c:1285:ldlm_handle_enqueue0()) ### lock on disconnected export ffff8f1613e86800 ns: mdt-fir-MDT0002_UUID lock: ffff8f222e81f2c0/0x5d9ee6e659f278cd lrc: 2/0,0 mode: --/CR res: [0x2c002cc27:0x67da:0x0].0x0 bits 0x0/0x0 rrc: 3 type: IBT flags: 0x40000000000000 nid: local remote: 0xac353b3557a49afc expref: -99 pid: 24587 timeout: 0 lvb_type: 0 Aug 27 12:15:32 fir-md1-s1 kernel: LustreError: 20931:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.109.13@o2ib4: deadline 20:2290s ago req@ffff8f08daf23450 x1634184282921408/t0(0) o103->@:22/0 lens 328/0 e 0 to 0 dl 1566931042 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:15:33 fir-md1-s1 kernel: LustreError: 20931:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 64431 previous similar messages Aug 27 12:15:40 fir-md1-s1 kernel: Lustre: 20217:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566933319/real 1566933319] req@ffff8f2ad2f7cb00 x1636782180535104/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566933340 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:15:40 fir-md1-s1 kernel: Lustre: 20217:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7961 previous similar messages Aug 27 12:16:19 fir-md1-s1 kernel: LNetError: 23710:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.21.27@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:16:19 fir-md1-s1 kernel: LNetError: 23710:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 33 previous similar messages Aug 27 12:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 12:16:22 fir-md1-s1 kernel: Lustre: Skipped 1619 previous similar messages Aug 27 12:16:35 fir-md1-s1 kernel: LustreError: 21910:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+13s req@ffff8f2ff21ab300 x1641920074553472/t0(0) o37->9b6d8ed0-1e49-aab2-2318-0e7d932be989@10.8.8.26@o2ib6:22/0 lens 448/440 e 0 to 0 dl 1566933382 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 12:16:35 fir-md1-s1 kernel: LustreError: 21910:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Aug 27 12:17:04 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566933334, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f1cc2e7a1c0/0x5d9ee6e659f2f5b5 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e659f2fae7 expref: -99 pid: 23597 timeout: 0 lvb_type: 0 Aug 27 12:17:53 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:17:53 fir-md1-s1 kernel: Lustre: Skipped 122 previous similar messages Aug 27 12:17:54 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f24c0e99c00 Aug 27 12:17:55 fir-md1-s1 kernel: LustreError: 20195:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f3199f96800 Aug 27 12:17:55 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 12:17:55 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 18 previous similar messages Aug 27 12:18:16 fir-md1-s1 kernel: Pid: 23747, comm: mdt02_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:18:16 fir-md1-s1 kernel: Call Trace: Aug 27 12:18:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:18:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:18:16 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:18:16 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:18:16 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:18:16 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:18:16 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:18:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:18:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:18:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:18:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:18:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:18:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:18:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933496.23747 Aug 27 12:18:22 fir-md1-s1 kernel: Pid: 20464, comm: mdt02_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:18:22 fir-md1-s1 kernel: Call Trace: Aug 27 12:18:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:18:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:18:22 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:18:22 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:18:22 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:18:22 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:18:22 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:18:22 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:18:22 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:18:22 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:18:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:18:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:18:23 fir-md1-s1 kernel: Pid: 25674, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:18:23 fir-md1-s1 kernel: Call Trace: Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:18:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:18:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:18:23 fir-md1-s1 kernel: Pid: 23565, comm: mdt00_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:18:23 fir-md1-s1 kernel: Call Trace: Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:18:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:18:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:18:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:18:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933508.20555 Aug 27 12:18:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f2500469400, cur 1566933509 expire 1566933359 last 1566933282 Aug 27 12:18:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933536.23597 Aug 27 12:19:37 fir-md1-s1 kernel: LustreError: 20724:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566933486, 91s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2234e67080/0x5d9ee6e659fd1dfd lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e659fd1e04 expref: -99 pid: 20724 timeout: 0 lvb_type: 0 Aug 27 12:19:37 fir-md1-s1 kernel: LustreError: 20724:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 27 12:19:38 fir-md1-s1 kernel: LustreError: 27316:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566933488, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f4064d24ec0/0x5d9ee6e659fd3dd8 lrc: 3/0,1 mode: --/CW res: [0x20002a3aa:0xe74:0x0].0x0 bits 0x2/0x0 rrc: 15 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 27316 timeout: 0 lvb_type: 0 Aug 27 12:19:38 fir-md1-s1 kernel: LustreError: 27316:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 104 previous similar messages Aug 27 12:19:41 fir-md1-s1 kernel: LustreError: 6550:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2c198d7c50 x1642973059942896/t0(0) o3->6282e924-823c-ee43-6de9-1b6a734cef6f@10.8.0.67@o2ib6:11/0 lens 488/440 e 0 to 0 dl 1566933611 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:19:41 fir-md1-s1 kernel: LustreError: 6550:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 11 previous similar messages Aug 27 12:19:51 fir-md1-s1 kernel: Pid: 23659, comm: mdt02_065 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:19:51 fir-md1-s1 kernel: Call Trace: Aug 27 12:19:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:19:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:19:51 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:19:51 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:19:51 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:19:51 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:19:51 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:19:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:19:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:19:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:19:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:19:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:19:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:19:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933591.23659 Aug 27 12:19:54 fir-md1-s1 kernel: Pid: 21181, comm: mdt02_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:19:54 fir-md1-s1 kernel: Call Trace: Aug 27 12:19:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 12:19:54 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:19:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:19:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:19:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:19:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:19:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933594.21181 Aug 27 12:20:08 fir-md1-s1 kernel: Pid: 20541, comm: mdt00_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:20:08 fir-md1-s1 kernel: Call Trace: Aug 27 12:20:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:20:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:20:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:20:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:20:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:20:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:20:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:20:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:20:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933608.20541 Aug 27 12:20:10 fir-md1-s1 kernel: Pid: 50576, comm: mdt03_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:20:10 fir-md1-s1 kernel: Call Trace: Aug 27 12:20:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:20:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:20:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:20:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:20:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:20:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:20:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:20:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:20:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:20:12 fir-md1-s1 kernel: Pid: 21333, comm: mdt02_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:20:12 fir-md1-s1 kernel: Call Trace: Aug 27 12:20:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:20:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:20:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:20:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:20:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:20:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:20:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:20:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:20:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:20:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933612.21333 Aug 27 12:20:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933618.97659 Aug 27 12:20:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933622.20729 Aug 27 12:20:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933639.97648 Aug 27 12:20:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:20:46 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 27 12:20:47 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f08ac3cbc00 ns: mdt-fir-MDT0000_UUID lock: ffff8f244f360d80/0x5d9ee6e659fded8e lrc: 3/0,0 mode: PW/PW res: [0x200029821:0x1d0f:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.106.41@o2ib4 remote: 0x87b92793371664be expref: 33 pid: 97672 timeout: 0 lvb_type: 0 Aug 27 12:20:47 fir-md1-s1 kernel: LustreError: 97672:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 97 previous similar messages Aug 27 12:21:27 fir-md1-s1 kernel: LNet: Service thread pid 20724 was inactive for 200.73s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:21:27 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Aug 27 12:21:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933687.20724 Aug 27 12:21:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933694.50442 Aug 27 12:21:43 fir-md1-s1 kernel: LustreError: 21294:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2616729850 x1638101805262368/t0(0) o3->f0a8fbb7-06c4-ed16-a94f-6cea310ceb29@10.8.0.82@o2ib6:2/0 lens 488/440 e 0 to 0 dl 1566933722 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:21:43 fir-md1-s1 kernel: LustreError: 21294:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 21 previous similar messages Aug 27 12:23:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 151s: evicting client at 10.9.104.52@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f17a9a78b40/0x5d9ee6e659f6c16c lrc: 3/0,0 mode: PR/PR res: [0x20002a246:0x31af:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.104.52@o2ib4 remote: 0xcb2b10f0da7fa546 expref: 235 pid: 10145 timeout: 6048742 lvb_type: 0 Aug 27 12:23:24 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 67 previous similar messages Aug 27 12:23:47 fir-md1-s1 kernel: Lustre: 21305:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 29s req@ffff8f290c12e900 x1642435083385616/t0(0) o103->af4de044-a140-c048-489c-d7654e95426c@10.9.110.39@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:23:47 fir-md1-s1 kernel: Lustre: 21305:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2913134 previous similar messages Aug 27 12:23:47 fir-md1-s1 kernel: LustreError: 36723:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f44b21cf500 x1636782182740720/t0(0) o105->fir-MDT0000@10.9.113.9@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 12:23:47 fir-md1-s1 kernel: LustreError: 36723:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Aug 27 12:24:07 fir-md1-s1 kernel: LustreError: 21871:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f188132c800 x1641920564022208/t0(0) o37->75e153d7-5437-c3e3-f58a-1273d04c8f0e@10.9.101.25@o2ib4:6/0 lens 448/440 e 0 to 0 dl 1566933876 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:24:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933853.23573 Aug 27 12:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 6d4d8c33-ecef-fdb4-378f-8ac8e4e1e0ce (at 10.9.101.34@o2ib4), client will retry: rc -107 Aug 27 12:24:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Aug 27 12:24:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 890658aa-eec6-cfb9-ea7b-2449bb99a05f (at 10.8.11.33@o2ib6) Aug 27 12:24:31 fir-md1-s1 kernel: Lustre: Skipped 31886 previous similar messages Aug 27 12:24:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST0001_UUID (at 10.0.10.102@o2ib7) reconnecting Aug 27 12:24:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST0009_UUID (at 10.0.10.102@o2ib7) reconnecting Aug 27 12:24:47 fir-md1-s1 kernel: Lustre: Skipped 30413 previous similar messages Aug 27 12:24:47 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Aug 27 12:24:54 fir-md1-s1 kernel: LNet: Service thread pid 23610 was inactive for 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:24:54 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:24:54 fir-md1-s1 kernel: Pid: 23610, comm: mdt02_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:24:54 fir-md1-s1 kernel: Call Trace: Aug 27 12:24:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 12:24:54 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:24:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:24:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:24:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:24:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:24:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566933894.23610 Aug 27 12:24:58 fir-md1-s1 kernel: Lustre: 33421:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f18221f7800 x1642950216682288/t0(0) o103->29fdc5e8-85e6-bc3e-9056-a0c2a1f07a9a@10.8.4.22@o2ib6:2/0 lens 328/0 e 0 to 0 dl 1566933902 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:24:58 fir-md1-s1 kernel: Lustre: 33421:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2036801 previous similar messages Aug 27 12:25:05 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2848s); client may timeout. req@ffff8f2d76c7f500 x1634183248032768/t0(0) o103->9871df44-3e50-912f-f998-77063c2447b4@10.9.109.16@o2ib4:7/0 lens 328/0 e 0 to 0 dl 1566931057 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:25:05 fir-md1-s1 kernel: Lustre: 27444:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 127536 previous similar messages Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: 25080:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=60 reqQ=292537 recA=0, svcEst=20, delay=52 Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: 25080:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 41941 previous similar messages Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: Skipped 41962 previous similar messages Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: 21285:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-6s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3e7201e050 x1642614161726032/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:19/0 lens 328/0 e 0 to 0 dl 1566933919 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:25:25 fir-md1-s1 kernel: Lustre: 21285:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1261351 previous similar messages Aug 27 12:25:40 fir-md1-s1 kernel: Lustre: 20222:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566933919/real 1566933919] req@ffff8f1d1d160600 x1636782180543920/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566933940 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:25:40 fir-md1-s1 kernel: Lustre: 20222:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2787 previous similar messages Aug 27 12:26:20 fir-md1-s1 kernel: LustreError: 25080:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.10@o2ib6: deadline 30:3237s ago req@ffff8f2140faf200 x1642803104152864/t0(0) o103->d80129c2-0e3c-4dab-61df-4121beba5d58@10.8.27.10@o2ib6:23/0 lens 328/0 e 0 to 0 dl 1566930743 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 12:26:20 fir-md1-s1 kernel: LustreError: 25080:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 151 previous similar messages Aug 27 12:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Aug 27 12:26:22 fir-md1-s1 kernel: Lustre: Skipped 1200 previous similar messages Aug 27 12:26:30 fir-md1-s1 kernel: LustreError: 23761:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566933900, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f2e1546ca40/0x5d9ee6e65a2d03cd lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65a2d03f7 expref: -99 pid: 23761 timeout: 0 lvb_type: 0 Aug 27 12:26:30 fir-md1-s1 kernel: LustreError: 23761:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Aug 27 12:27:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 207 seconds. I think it's dead, and I am evicting it. exp ffff8f3672edcc00, cur 1566934059 expire 1566933909 last 1566933852 Aug 27 12:27:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 12:27:54 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0000: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:27:54 fir-md1-s1 kernel: Lustre: Skipped 88 previous similar messages Aug 27 12:28:20 fir-md1-s1 kernel: Pid: 23761, comm: mdt02_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:28:20 fir-md1-s1 kernel: Call Trace: Aug 27 12:28:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 12:28:20 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:28:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:28:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:28:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:28:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:28:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934100.23761 Aug 27 12:28:33 fir-md1-s1 kernel: Pid: 23720, comm: mdt02_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:28:33 fir-md1-s1 kernel: Call Trace: Aug 27 12:28:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:28:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:28:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:28:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:28:33 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:28:33 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:28:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:28:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:28:34 fir-md1-s1 kernel: Pid: 97662, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:28:34 fir-md1-s1 kernel: Call Trace: Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:28:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:28:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:28:34 fir-md1-s1 kernel: Pid: 23592, comm: mdt03_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:28:34 fir-md1-s1 kernel: Call Trace: Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x438/0xb20 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_reint_open+0xc58/0x28b0 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_intent_open+0x82/0x350 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:28:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:28:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:28:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:28:44 fir-md1-s1 kernel: LNet: Service thread pid 23592 completed after 216.14s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 12:28:44 fir-md1-s1 kernel: LNet: Skipped 38 previous similar messages Aug 27 12:29:13 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 12:29:14 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 10 previous similar messages Aug 27 12:29:39 fir-md1-s1 kernel: LustreError: 22285:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566934089, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f18e606a1c0/0x5d9ee6e65a3f9f7e lrc: 3/0,1 mode: --/EX res: [0x20002a40f:0x12:0x0].0x0 bits 0x8/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22285 timeout: 0 lvb_type: 0 Aug 27 12:29:39 fir-md1-s1 kernel: LustreError: 22285:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 200 previous similar messages Aug 27 12:29:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 5 seconds Aug 27 12:29:53 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 199 previous similar messages Aug 27 12:30:34 fir-md1-s1 kernel: Pid: 23587, comm: mdt03_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:30:34 fir-md1-s1 kernel: Call Trace: Aug 27 12:30:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:30:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:30:34 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:30:34 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:30:34 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:30:34 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:30:34 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:30:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:30:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:30:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:30:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:30:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:30:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:30:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934234.23587 Aug 27 12:30:45 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Aug 27 12:30:45 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 11 previous similar messages Aug 27 12:30:45 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (17): c: 0, oc: 0, rc: 8 Aug 27 12:30:45 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 11 previous similar messages Aug 27 12:30:51 fir-md1-s1 kernel: LustreError: 23692:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f3384ddc000 ns: mdt-fir-MDT0000_UUID lock: ffff8f287e3cc380/0x5d9ee6e65a4ca290 lrc: 1/0,0 mode: EX/EX res: [0x20002a41c:0x6:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT flags: 0x54801000000000 nid: 10.9.101.49@o2ib4 remote: 0xdcb76e04d9553104 expref: 20 pid: 23692 timeout: 0 lvb_type: 3 Aug 27 12:30:51 fir-md1-s1 kernel: LustreError: 23692:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 79 previous similar messages Aug 27 12:30:52 fir-md1-s1 kernel: LustreError: 20472:0:(ldlm_lib.c:3273:target_bulk_io()) @@@ truncated bulk READ 0(4096) req@ffff8f1ecae33300 x1631565412227472/t0(0) o37->f7d39296-2681-999e-c9dd-38a3ef8bf584@10.9.106.15@o2ib4:23/0 lens 448/440 e 0 to 0 dl 1566934223 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:30:52 fir-md1-s1 kernel: LustreError: 46581:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(91842) req@ffff8f1a1aaa4c50 x1642528342850432/t0(0) o4->d1a8de5f-e132-abf7-7e4b-84b2d20d113d@10.8.8.31@o2ib6:19/0 lens 488/448 e 0 to 0 dl 1566934219 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:30:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with d1a8de5f-e132-abf7-7e4b-84b2d20d113d (at 10.8.8.31@o2ib6), client will retry: rc = -110 Aug 27 12:30:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Aug 27 12:30:55 fir-md1-s1 kernel: LustreError: 20667:0:(osp_precreate.c:940:osp_precreate_cleanup_orphans()) fir-OST0003-osc-MDT0002: cannot cleanup orphans: rc = -11 Aug 27 12:30:57 fir-md1-s1 kernel: LustreError: 55555:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+22s req@ffff8f0615ff8450 x1642682259667872/t0(0) o256->137d9768-a025-eeea-eb13-e794a9a88228@10.8.11.29@o2ib6:5/0 lens 304/240 e 1 to 0 dl 1566934235 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:30:58 fir-md1-s1 kernel: LNetError: 23751:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.28.4@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:30:58 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0c54cd5200 Aug 27 12:30:58 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.67@o2ib6 from Aug 27 12:30:58 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 1962 previous similar messages Aug 27 12:30:59 fir-md1-s1 kernel: Pid: 24587, comm: mdt01_065 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:30:59 fir-md1-s1 kernel: Call Trace: Aug 27 12:30:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:30:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:30:59 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:30:59 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:30:59 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:30:59 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:30:59 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:30:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:30:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:30:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:30:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:30:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:30:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:30:59 fir-md1-s1 kernel: LNetError: 55491:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.27.6@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:30:59 fir-md1-s1 kernel: LNetError: 55491:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 944 previous similar messages Aug 27 12:31:01 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f29dffd6a00 Aug 27 12:31:02 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.67@o2ib6 from Aug 27 12:31:02 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 50562 previous similar messages Aug 27 12:31:05 fir-md1-s1 kernel: Pid: 23619, comm: mdt02_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:31:05 fir-md1-s1 kernel: Call Trace: Aug 27 12:31:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:31:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:31:05 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:31:05 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:31:05 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:31:05 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:31:05 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:31:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:31:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:31:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:31:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:31:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:31:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:31:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934265.23619 Aug 27 12:31:06 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.67@o2ib6 from Aug 27 12:31:06 fir-md1-s1 kernel: LNetError: 20734:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 113661 previous similar messages Aug 27 12:31:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.212@o2ib7: accepting Aug 27 12:31:09 fir-md1-s1 kernel: LNetError: 109582:0:(o2iblnd_cb.c:3281:kiblnd_cm_callback()) 10.0.10.211@o2ib7 DISCONNECTED Aug 27 12:31:10 fir-md1-s1 kernel: Pid: 23608, comm: mdt02_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:31:10 fir-md1-s1 kernel: Call Trace: Aug 27 12:31:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:31:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:31:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:31:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:31:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:31:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:31:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:31:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:31:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:31:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934270.23608 Aug 27 12:31:13 fir-md1-s1 kernel: Pid: 20734, comm: mdt02_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:31:13 fir-md1-s1 kernel: Call Trace: Aug 27 12:31:13 fir-md1-s1 kernel: [] __cond_resched+0x26/0x30 Aug 27 12:31:13 fir-md1-s1 kernel: [] ptlrpc_check_set.part.23+0x91/0x1df0 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x57d/0x8d0 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:31:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:31:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:31:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:31:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:31:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:31:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:31:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:31:14 fir-md1-s1 kernel: LNetError: 23720:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.82@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:31:14 fir-md1-s1 kernel: LNetError: 23720:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 197664 previous similar messages Aug 27 12:31:33 fir-md1-s1 kernel: LustreError: 20734:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.0.67@o2ib6) failed to reply to blocking AST (req@ffff8f2f9de6ce00 x1636782182951936 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f1cbcba8480/0x5d9ee6e65a3da777 lrc: 4/0,0 mode: PR/PR res: [0x2c002cdc8:0x2:0x0].0x0 bits 0x5b/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.0.67@o2ib6 remote: 0x765f6755174d5b72 expref: 3408 pid: 97669 timeout: 6049369 lvb_type: 0 Aug 27 12:31:33 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.67@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 27 12:31:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:31:33 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Aug 27 12:31:33 fir-md1-s1 kernel: LNetError: 109998:0:(o2iblnd_cb.c:3281:kiblnd_cm_callback()) 10.0.10.108@o2ib7 DISCONNECTED Aug 27 12:31:33 fir-md1-s1 kernel: LNetError: 55549:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.11.24@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:31:33 fir-md1-s1 kernel: LNetError: 55549:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 154401 previous similar messages Aug 27 12:31:39 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.211@o2ib7: accepting Aug 27 12:31:42 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.105@o2ib7: accepting Aug 27 12:31:43 fir-md1-s1 kernel: LNetError: 109962:0:(o2iblnd_cb.c:3281:kiblnd_cm_callback()) 10.0.10.105@o2ib7 DISCONNECTED Aug 27 12:31:49 fir-md1-s1 kernel: LNetError: 69449:0:(o2iblnd_cb.c:3281:kiblnd_cm_callback()) 10.0.10.52@o2ib7 DISCONNECTED Aug 27 12:32:04 fir-md1-s1 kernel: LNet: Service thread pid 23735 was inactive for 200.47s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:32:05 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Aug 27 12:32:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934324.23735 Aug 27 12:32:06 fir-md1-s1 kernel: LNetError: 36726:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.9.0.61@o2ib4 from Aug 27 12:32:06 fir-md1-s1 kernel: LNetError: 36726:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 37818 previous similar messages Aug 27 12:32:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934326.23684 Aug 27 12:32:10 fir-md1-s1 kernel: LNetError: 110079:0:(o2iblnd_cb.c:3281:kiblnd_cm_callback()) 10.0.10.108@o2ib7 DISCONNECTED Aug 27 12:32:10 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.209@o2ib7: accepting Aug 27 12:32:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934345.22282 Aug 27 12:32:44 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f1a1aaf6050 x1641918449913008/t0(0) o4->c68e98dd-6420-0e44-f2a5-d74db1d720f2@10.9.104.5@o2ib4:26/0 lens 488/448 e 0 to 0 dl 1566934376 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:32:44 fir-md1-s1 kernel: LustreError: 21708:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages Aug 27 12:33:26 fir-md1-s1 kernel: LustreError: 21897:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f43aa96c500 x1643047273745920/t0(0) o37->01220ca0-c29f-4cb8-bddb-c495482aa608@10.9.0.61@o2ib4:23/0 lens 448/440 e 0 to 0 dl 1566934403 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:33:26 fir-md1-s1 kernel: LustreError: 21897:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 19 previous similar messages Aug 27 12:33:34 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 13s: evicting client at 10.9.103.25@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f17846bc140/0x5d9ee6e65a493cc2 lrc: 3/0,0 mode: PR/PR res: [0x2c0026ed7:0x16970:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.9.103.25@o2ib4 remote: 0xce5c6e05f9980c11 expref: 17682 pid: 23593 timeout: 6049455 lvb_type: 0 Aug 27 12:33:34 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 78 previous similar messages Aug 27 12:33:47 fir-md1-s1 kernel: Lustre: 31013:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 28s req@ffff8f2d1e1e4200 x1640712985348176/t0(0) o103->@:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:33:47 fir-md1-s1 kernel: Lustre: 31013:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2125567 previous similar messages Aug 27 12:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with cea6adbc-46ce-842f-a429-3350fc5db284 (at 10.8.18.26@o2ib6), client will retry: rc -110 Aug 27 12:34:24 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Aug 27 12:34:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8277604b-043b-5587-f5e2-29fe09f8890f (at 10.8.7.2@o2ib6) Aug 27 12:34:31 fir-md1-s1 kernel: Lustre: Skipped 30563 previous similar messages Aug 27 12:34:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client dedbe9ee-8903-d6b4-bf80-d42c33abfec1 (at 10.9.108.57@o2ib4) reconnecting Aug 27 12:34:47 fir-md1-s1 kernel: Lustre: Skipped 27381 previous similar messages Aug 27 12:35:21 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (1/-117), not sending early reply req@ffff8f1f945fd400 x1636679632076320/t0(0) o101->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1566934519 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 12:35:21 fir-md1-s1 kernel: Lustre: 97662:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1136652 previous similar messages Aug 27 12:35:21 fir-md1-s1 kernel: Lustre: 20371:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:4170s); client may timeout. req@ffff8f2596605400 x1642613912816656/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:18/0 lens 328/0 e 0 to 0 dl 1566930348 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:35:21 fir-md1-s1 kernel: Lustre: 20371:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 179955 previous similar messages Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: 21366:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=17 reqQ=210536 recA=0, svcEst=20, delay=179 Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: 21366:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 39762 previous similar messages Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: 46812:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-36s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f3596668900 x1642614205714672/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:19/0 lens 328/0 e 0 to 0 dl 1566934489 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: 46812:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1193075 previous similar messages Aug 27 12:35:25 fir-md1-s1 kernel: Lustre: Skipped 39800 previous similar messages Aug 27 12:35:29 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [ldlm_cn00_010:23016] Aug 27 12:35:29 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses enclosure dm_multipath ipmi_si pcspkr dm_mod ipmi_devintf k10temp ccp sg i2c_piix4 ipmi_msghandler acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif Aug 27 12:35:29 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core(OE) libahci mlxfw(OE) crct10dif_pclmul devlink crct10dif_common tg3 mlx_compat(OE) drm ptp libata megaraid_sas crc32c_intel drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas Aug 27 12:35:29 fir-md1-s1 kernel: CPU: 8 PID: 23016 Comm: ldlm_cn00_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 Aug 27 12:35:29 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 Aug 27 12:35:29 fir-md1-s1 kernel: task: ffff8f153400e180 ti: ffff8f1273824000 task.ti: ffff8f1273824000 Aug 27 12:35:29 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Aug 27 12:35:29 fir-md1-s1 kernel: RSP: 0018:ffff8f1273827da8 EFLAGS: 00000246 Aug 27 12:35:29 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff8f152babfc90 RCX: 0000000000410000 Aug 27 12:35:29 fir-md1-s1 kernel: RDX: ffff8f457f59b780 RSI: 0000000000d90101 RDI: ffff8f0f78be3a88 Aug 27 12:35:29 fir-md1-s1 kernel: RBP: ffff8f1273827da8 R08: ffff8f153ee9b780 R09: 0000000000000000 Aug 27 12:35:29 fir-md1-s1 kernel: R10: 0000000000000002 R11: fffff3b930ba4600 R12: 0000000000000258 Aug 27 12:35:29 fir-md1-s1 kernel: R13: ffff8f1271fc4840 R14: 00000000bd6fad7c R15: 000000000003fb69 Aug 27 12:35:29 fir-md1-s1 kernel: FS: 00007f17a30c1700(0000) GS:ffff8f153ee80000(0000) knlGS:0000000000000000 Aug 27 12:35:29 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 27 12:35:29 fir-md1-s1 kernel: CR2: 0000000000a1acd8 CR3: 000000103c36c000 CR4: 00000000003407e0 Aug 27 12:35:29 fir-md1-s1 kernel: Call Trace: Aug 27 12:35:29 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Aug 27 12:35:29 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Aug 27 12:35:29 fir-md1-s1 kernel: [] ptlrpc_server_hpreq_fini+0x68/0x170 [ptlrpc] Aug 27 12:35:29 fir-md1-s1 kernel: [] ptlrpc_main+0xdc0/0x1fc0 [ptlrpc] Aug 27 12:35:29 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Aug 27 12:35:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:35:29 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 12:35:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:35:29 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 12:35:29 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 85 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Aug 27 12:35:30 fir-md1-s1 kernel: LNetError: 97641:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.18.26@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:35:30 fir-md1-s1 kernel: LNetError: 97641:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 50307 previous similar messages Aug 27 12:35:36 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: accepting Aug 27 12:35:38 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b62d62200 Aug 27 12:35:38 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b62d60a00 Aug 27 12:35:38 fir-md1-s1 kernel: LustreError: 20188:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0b62d62000 Aug 27 12:35:42 fir-md1-s1 kernel: Lustre: 20218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566934521/real 1566934521] req@ffff8f16bb44da00 x1636782180543856/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 2 to 1 dl 1566934542 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:35:42 fir-md1-s1 kernel: Lustre: 20218:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 633672 previous similar messages Aug 27 12:35:48 fir-md1-s1 kernel: LustreError: 107737:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0e9bc50f00 x1636782183084848/t0(0) o105->fir-MDT0002@10.9.103.14@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 12:35:48 fir-md1-s1 kernel: LustreError: 107737:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Aug 27 12:36:13 fir-md1-s1 kernel: LNet: Service thread pid 22006 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:36:13 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:36:13 fir-md1-s1 kernel: Pid: 22006, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:36:13 fir-md1-s1 kernel: Call Trace: Aug 27 12:36:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:36:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:36:13 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:36:13 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:36:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:36:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:36:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:36:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:36:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:36:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:36:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:36:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:36:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:36:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934573.22006 Aug 27 12:36:24 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.102.12@o2ib4, removing former export from same NID Aug 27 12:36:24 fir-md1-s1 kernel: Lustre: Skipped 3752 previous similar messages Aug 27 12:36:26 fir-md1-s1 kernel: Pid: 22285, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:36:26 fir-md1-s1 kernel: Call Trace: Aug 27 12:36:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:36:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:36:27 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:36:27 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:36:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:36:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:36:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:36:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:36:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:36:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934587.22285 Aug 27 12:36:32 fir-md1-s1 kernel: Pid: 26254, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:36:32 fir-md1-s1 kernel: Call Trace: Aug 27 12:36:32 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:36:32 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:36:32 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:36:32 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:36:32 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:36:32 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:36:32 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:36:32 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:36:32 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:36:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934592.26254 Aug 27 12:36:34 fir-md1-s1 kernel: Pid: 20732, comm: mdt02_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:36:34 fir-md1-s1 kernel: Call Trace: Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:36:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:36:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:36:34 fir-md1-s1 kernel: Pid: 23672, comm: mdt00_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:36:34 fir-md1-s1 kernel: Call Trace: Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:36:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:36:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:36:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:36:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934603.23620 Aug 27 12:36:46 fir-md1-s1 kernel: LustreError: 25082:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.114.1@o2ib4: deadline 30:1s ago req@ffff8f223ca9ce00 x1631574978986016/t0(0) o103->1d81cd7e-3850-4cfe-7531-522b91b4890c@10.9.114.1@o2ib4:15/0 lens 328/0 e 0 to 0 dl 1566934605 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Aug 27 12:36:46 fir-md1-s1 kernel: LustreError: 25082:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 304 previous similar messages Aug 27 12:36:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934607.23692 Aug 27 12:36:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934615.21446 Aug 27 12:36:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934616.10506 Aug 27 12:36:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934619.21434 Aug 27 12:37:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934633.97642 Aug 27 12:38:02 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:38:02 fir-md1-s1 kernel: Lustre: Skipped 259 previous similar messages Aug 27 12:38:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f11937a3000, cur 1566934732 expire 1566934582 last 1566934505 Aug 27 12:38:52 fir-md1-s1 kernel: Lustre: Skipped 253 previous similar messages Aug 27 12:39:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934775.23588 Aug 27 12:39:41 fir-md1-s1 kernel: LustreError: 21458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566934688, 91s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f16752533c0/0x5d9ee6e65a706731 lrc: 3/0,1 mode: --/PW res: [0x200029c72:0x1e:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21458 timeout: 0 lvb_type: 0 Aug 27 12:39:41 fir-md1-s1 kernel: LustreError: 21458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 146 previous similar messages Aug 27 12:39:43 fir-md1-s1 kernel: LustreError: 21909:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2d2a023900 x1636679634043744/t0(0) o37->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:12/0 lens 448/440 e 0 to 0 dl 1566934812 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:39:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934785.23077 Aug 27 12:39:46 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 12:39:46 fir-md1-s1 kernel: LNetError: 20189:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 20 previous similar messages Aug 27 12:39:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934794.23593 Aug 27 12:40:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934814.23661 Aug 27 12:40:37 fir-md1-s1 kernel: LustreError: 71848:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f076dd04200 x1642587839081584/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:26/0 lens 448/440 e 0 to 0 dl 1566934856 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 12:41:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934864.24584 Aug 27 12:41:24 fir-md1-s1 kernel: Pid: 21145, comm: mdt03_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:41:24 fir-md1-s1 kernel: Call Trace: Aug 27 12:41:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:41:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:41:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:41:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:41:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:41:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:41:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:41:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:41:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:41:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934884.21145 Aug 27 12:41:39 fir-md1-s1 kernel: LustreError: 20467:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f364279bc00 ns: mdt-fir-MDT0000_UUID lock: ffff8f3c48708000/0x5d9ee6e65a737ce8 lrc: 3/0,0 mode: --/PW res: [0x200029957:0x1fc:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x54a01000000000 nid: 10.9.108.58@o2ib4 remote: 0x8a5940d2c2bed8f3 expref: 10 pid: 20467 timeout: 0 lvb_type: 0 Aug 27 12:41:39 fir-md1-s1 kernel: LustreError: 20467:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 107 previous similar messages Aug 27 12:42:11 fir-md1-s1 kernel: Pid: 23601, comm: mdt02_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:42:11 fir-md1-s1 kernel: Call Trace: Aug 27 12:42:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:42:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:42:11 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:42:11 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:42:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:42:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:42:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:42:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:42:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:42:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934928.23601 Aug 27 12:42:11 fir-md1-s1 kernel: LustreError: 71857:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 20+5s req@ffff8f2fddb4e900 x1642614221296976/t0(0) o37->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:6/0 lens 448/440 e 0 to 0 dl 1566934926 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 12:42:11 fir-md1-s1 kernel: LustreError: 71857:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 41 previous similar messages Aug 27 12:42:12 fir-md1-s1 kernel: LNet: Service thread pid 23454 completed after 200.16s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 12:42:12 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Aug 27 12:42:52 fir-md1-s1 kernel: LustreError: 71867:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f3faa88b600 x1636679634443616/t0(0) o37->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:20/0 lens 448/440 e 0 to 0 dl 1566935000 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:42:52 fir-md1-s1 kernel: LustreError: 71867:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 121 previous similar messages Aug 27 12:42:56 fir-md1-s1 kernel: LustreError: 20722:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 344357: error -110 Aug 27 12:43:08 fir-md1-s1 kernel: Pid: 23657, comm: mdt02_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:43:08 fir-md1-s1 kernel: Call Trace: Aug 27 12:43:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:43:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:43:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:43:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:43:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:43:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934988.23657 Aug 27 12:43:12 fir-md1-s1 kernel: Pid: 23652, comm: mdt02_062 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:43:12 fir-md1-s1 kernel: Call Trace: Aug 27 12:43:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:43:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:43:12 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:43:12 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:43:12 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:43:12 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:43:12 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:43:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:43:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:43:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:43:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:43:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:43:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:43:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566934992.23652 Aug 27 12:43:31 fir-md1-s1 kernel: LustreError: 27063:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2c7a556c00 x1636679634540432/t0(0) o37->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:29/0 lens 448/440 e 0 to 0 dl 1566935039 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:43:31 fir-md1-s1 kernel: LustreError: 27063:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 61 previous similar messages Aug 27 12:43:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.105.23@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8f0df89fb3c0/0x5d9ee6e65a784134 lrc: 3/0,0 mode: CR/CR res: [0x20002a42d:0x1:0x0].0x0 bits 0x9/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.105.23@o2ib4 remote: 0x33cf8ab5b24e99e5 expref: 20 pid: 21370 timeout: 6050083 lvb_type: 0 Aug 27 12:43:43 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 75 previous similar messages Aug 27 12:43:47 fir-md1-s1 kernel: Lustre: 28233:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 39s req@ffff8f3c9acc3850 x1642614204042160/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:43:47 fir-md1-s1 kernel: Lustre: 28233:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1953478 previous similar messages Aug 27 12:44:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 5 seconds Aug 27 12:44:00 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 500 previous similar messages Aug 27 12:44:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds Aug 27 12:44:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 37 previous similar messages Aug 27 12:44:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (2): c: 0, oc: 0, rc: 8 Aug 27 12:44:00 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 37 previous similar messages Aug 27 12:44:01 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8f0866bcc400 Aug 27 12:44:08 fir-md1-s1 kernel: LustreError: 20180:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8f0866bce600 Aug 27 12:44:09 fir-md1-s1 kernel: LNetError: 23720:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.22.25@o2ib6 from 10.0.10.51@o2ib7 Aug 27 12:44:09 fir-md1-s1 kernel: LNetError: 23720:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 23 previous similar messages Aug 27 12:44:11 fir-md1-s1 kernel: LustreError: 21429:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 3004: error -110 Aug 27 12:44:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 639ee15e-2da6-9d93-315b-2c6ce5340bd5 (at 10.8.26.2@o2ib6) Aug 27 12:44:31 fir-md1-s1 kernel: Lustre: Skipped 32538 previous similar messages Aug 27 12:44:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST002d_UUID (at 10.0.10.108@o2ib7) reconnecting Aug 27 12:44:47 fir-md1-s1 kernel: Lustre: Skipped 29864 previous similar messages Aug 27 12:44:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with c74cabd5-45b1-86e5-60f0-8f68b07a88b1 (at 10.9.103.24@o2ib4), client will retry: rc = -110 Aug 27 12:44:53 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 12:44:54 fir-md1-s1 kernel: Pid: 21461, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:44:54 fir-md1-s1 kernel: Call Trace: Aug 27 12:44:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:44:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:44:54 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:44:54 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:44:54 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:44:54 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:44:54 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:44:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:44:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:44:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:44:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:44:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:44:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:44:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935094.21461 Aug 27 12:45:18 fir-md1-s1 kernel: Lustre: 25084:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (0/0), not sending early reply req@ffff8f11d1ef4200 x1631553301376912/t0(0) o103->9f2ddc86-65fa-8a70-8eea-d37d69d7c71f@10.9.106.64@o2ib4:18/0 lens 328/0 e 0 to 0 dl 1566935118 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:45:18 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:4359s); client may timeout. req@ffff8f1f1957e300 x1636427980301200/t0(0) o103->095971d4-2c15-c9c6-8336-964f67ec504b@10.9.105.69@o2ib4:9/0 lens 328/0 e 0 to 0 dl 1566930759 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:45:18 fir-md1-s1 kernel: Lustre: 25082:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 198908 previous similar messages Aug 27 12:45:18 fir-md1-s1 kernel: Lustre: 25084:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1125520 previous similar messages Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: 30997:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=3 reqQ=1823326 recA=0, svcEst=20, delay=0 Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: 30997:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 38238 previous similar messages Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: 30997:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-299s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f14ae347800 x1642587827506336/t0(0) o103->@:26/0 lens 328/0 e 0 to 0 dl 1566934826 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: 30997:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 941097 previous similar messages Aug 27 12:45:25 fir-md1-s1 kernel: Lustre: Skipped 38224 previous similar messages Aug 27 12:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 9d6c448c-7311-384d-7af2-feecee7f2c1a (at 10.9.116.6@o2ib4), client will retry: rc -110 Aug 27 12:45:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Aug 27 12:45:33 fir-md1-s1 kernel: LNet: Service thread pid 23701 was inactive for 200.83s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:45:33 fir-md1-s1 kernel: LNet: Skipped 19 previous similar messages Aug 27 12:45:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935133.23701 Aug 27 12:45:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935149.23676 Aug 27 12:45:50 fir-md1-s1 kernel: Lustre: 20225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566935129/real 1566935129] req@ffff8f16bb44c800 x1636782180542768/t0(0) o103->fir-MDT0000-lwp-MDT0000@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566935150 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:45:50 fir-md1-s1 kernel: Lustre: 20225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3231 previous similar messages Aug 27 12:46:08 fir-md1-s1 kernel: LustreError: 23627:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 325064: error -110 Aug 27 12:46:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 12:46:29 fir-md1-s1 kernel: Lustre: Skipped 1714 previous similar messages Aug 27 12:46:47 fir-md1-s1 kernel: LustreError: 31012:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.104.28@o2ib4: deadline 30:4844s ago req@ffff8f33215ed100 x1641917757186896/t0(0) o103->@:2/0 lens 328/0 e 0 to 0 dl 1566930362 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:46:47 fir-md1-s1 kernel: LustreError: 31012:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 411 previous similar messages Aug 27 12:47:33 fir-md1-s1 kernel: LustreError: 23638:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566935163, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f435d73a400/0x5d9ee6e65aa78a41 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65aa78a95 expref: -99 pid: 23638 timeout: 0 lvb_type: 0 Aug 27 12:47:45 fir-md1-s1 kernel: LustreError: 107740:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f2c43148300 x1636782185720640/t0(0) o105->fir-MDT0002@10.9.103.14@o2ib4:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 12:47:45 fir-md1-s1 kernel: LustreError: 107740:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Aug 27 12:47:47 fir-md1-s1 kernel: LNet: Service thread pid 23698 was inactive for 200.89s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:47:47 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:47:47 fir-md1-s1 kernel: Pid: 23698, comm: mdt00_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:47:47 fir-md1-s1 kernel: Call Trace: Aug 27 12:47:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:47:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:47:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:47:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:47:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:47:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:47:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:47:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:47:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:47:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:47:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935267.23698 Aug 27 12:47:51 fir-md1-s1 kernel: Lustre: 35230:0:(tgt_handler.c:562:tgt_handle_recovery()) @@@ rq_xid 1631660212628432 matches saved xid, expected REPLAY or RESENT flag (0) req@ffff8f0eb21e6850 x1631660212628432/t0(0) o4->84b23abe-92b9-23b5-f8e1-877bc9a84312@10.9.103.15@o2ib4:6/0 lens 4168/0 e 0 to 0 dl 1566935286 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 12:48:09 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:48:09 fir-md1-s1 kernel: Lustre: Skipped 163 previous similar messages Aug 27 12:48:18 fir-md1-s1 kernel: Pid: 50445, comm: mdt01_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:48:18 fir-md1-s1 kernel: Call Trace: Aug 27 12:48:18 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:48:18 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:48:18 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:48:18 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:48:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:48:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:48:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:48:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:48:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:48:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:48:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:48:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:48:18 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:48:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935297.50445 Aug 27 12:48:27 fir-md1-s1 kernel: Lustre: 46562:0:(tgt_handler.c:562:tgt_handle_recovery()) @@@ rq_xid 1642977612032816 matches saved xid, expected REPLAY or RESENT flag (0) req@ffff8f1a1aaa3050 x1642977612032816/t0(0) o4->7372bd8e-4f77-9af0-e0f4-c1915e510b36@10.9.103.22@o2ib4:14/0 lens 664/0 e 0 to 0 dl 1566935324 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 12:48:27 fir-md1-s1 kernel: Pid: 23695, comm: mdt02_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:48:28 fir-md1-s1 kernel: Call Trace: Aug 27 12:48:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:48:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:48:28 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:48:28 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:48:28 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:48:28 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:48:28 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:48:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:48:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:48:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:48:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:48:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:48:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:48:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935307.23695 Aug 27 12:48:33 fir-md1-s1 kernel: Lustre: 14791:0:(tgt_handler.c:562:tgt_handle_recovery()) @@@ rq_xid 1642960512935600 matches saved xid, expected REPLAY or RESENT flag (0) req@ffff8f053f1e9450 x1642960512935600/t0(0) o4->01c5290e-2f99-d714-0fa9-403481192ee7@10.9.103.1@o2ib4:22/0 lens 808/0 e 0 to 0 dl 1566935332 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Aug 27 12:49:23 fir-md1-s1 kernel: Pid: 23638, comm: mdt03_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:49:23 fir-md1-s1 kernel: Call Trace: Aug 27 12:49:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Aug 27 12:49:23 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:49:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:49:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:49:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:49:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:49:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935363.23638 Aug 27 12:49:49 fir-md1-s1 kernel: LustreError: 50447:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 275116: error -110 Aug 27 12:49:54 fir-md1-s1 kernel: LustreError: 21419:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566935304, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f06cb273840/0x5d9ee6e65ab64d1b lrc: 3/0,1 mode: --/PW res: [0x2c002bf55:0x1480:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21419 timeout: 0 lvb_type: 0 Aug 27 12:49:54 fir-md1-s1 kernel: LustreError: 21419:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 128 previous similar messages Aug 27 12:50:05 fir-md1-s1 kernel: LustreError: 23594:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 301045: error -110 Aug 27 12:50:14 fir-md1-s1 kernel: Pid: 10144, comm: mdt02_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:50:14 fir-md1-s1 kernel: Call Trace: Aug 27 12:50:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:50:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:50:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:50:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:50:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:50:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:50:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:50:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:50:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:50:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:50:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:50:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:50:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:50:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:50:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:50:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:50:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935414.10144 Aug 27 12:50:24 fir-md1-s1 kernel: LustreError: 21829:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2776da8000 x1642528344117440/t0(0) o37->d1a8de5f-e132-abf7-7e4b-84b2d20d113d@10.8.8.31@o2ib6:23/0 lens 448/440 e 0 to 0 dl 1566935453 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:50:30 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 12:50:30 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Aug 27 12:50:46 fir-md1-s1 kernel: LustreError: 20729:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 289046: error -110 Aug 27 12:51:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c3c6d770-0204-0300-bf5f-7d6d6b4e4478 (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f36a9345000, cur 1566935492 expire 1566935342 last 1566935265 Aug 27 12:51:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 12:51:48 fir-md1-s1 kernel: LustreError: 71889:0:(ldlm_lib.c:3252:target_bulk_io()) @@@ Eviction on bulk READ req@ffff8f2a70848900 x1634134187201040/t0(0) o37->a7aad8e9-6055-f520-5dcf-5ea6b8e2ae73@10.9.104.52@o2ib4:17/0 lens 448/440 e 0 to 0 dl 1566935537 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:51:48 fir-md1-s1 kernel: LustreError: 23753:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f2ca7b63c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f327fe633c0/0x5d9ee6e65abc3958 lrc: 3/0,0 mode: PW/PW res: [0x2000298b0:0x2cb:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.106.15@o2ib4 remote: 0xa05eeaf4b7716e8d expref: 3 pid: 23753 timeout: 0 lvb_type: 0 Aug 27 12:51:48 fir-md1-s1 kernel: LustreError: 23753:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 99 previous similar messages Aug 27 12:52:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935533.22279 Aug 27 12:52:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:52:14 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Aug 27 12:52:35 fir-md1-s1 kernel: LNet: Service thread pid 20555 completed after 2247.56s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 12:52:35 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 27 12:52:37 fir-md1-s1 kernel: LustreError: 20990:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 30+2s req@ffff8f321d7c3f00 x1642528344228688/t0(0) o37->d1a8de5f-e132-abf7-7e4b-84b2d20d113d@10.8.8.31@o2ib6:5/0 lens 448/440 e 0 to 0 dl 1566935555 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 12:52:37 fir-md1-s1 kernel: LustreError: 20990:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 7 previous similar messages Aug 27 12:52:55 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f2d018ea850 x1643050012041472/t0(0) o3->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:24/0 lens 488/440 e 0 to 0 dl 1566935604 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:52:55 fir-md1-s1 kernel: LustreError: 22670:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 125 previous similar messages Aug 27 12:53:12 fir-md1-s1 kernel: Pid: 23622, comm: mdt02_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:53:12 fir-md1-s1 kernel: Call Trace: Aug 27 12:53:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:53:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:53:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:53:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:53:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:53:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:53:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:53:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:53:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:53:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:53:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935592.23622 Aug 27 12:53:15 fir-md1-s1 kernel: Pid: 21456, comm: mdt01_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:53:15 fir-md1-s1 kernel: Call Trace: Aug 27 12:53:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:53:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:53:15 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:53:15 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:53:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:53:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:53:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:53:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:53:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:53:32 fir-md1-s1 kernel: LustreError: 21903:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f2ac4c76850 x1636679635794832/t0(0) o37->cea6adbc-46ce-842f-a429-3350fc5db284@10.8.18.26@o2ib6:2/0 lens 448/440 e 0 to 0 dl 1566935642 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 12:53:32 fir-md1-s1 kernel: LustreError: 21903:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 93 previous similar messages Aug 27 12:53:47 fir-md1-s1 kernel: Pid: 21447, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:53:47 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 22s req@ffff8f43d61c0300 x1642614206451136/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 Aug 27 12:53:47 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2235321 previous similar messages Aug 27 12:53:47 fir-md1-s1 kernel: Call Trace: Aug 27 12:53:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:53:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:53:47 fir-md1-s1 kernel: [] mdt_rename_lock+0x24b/0x4b0 [mdt] Aug 27 12:53:47 fir-md1-s1 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Aug 27 12:53:47 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:53:47 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:53:47 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:53:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:53:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:53:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:53:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:53:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:53:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:53:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935628.21447 Aug 27 12:53:58 fir-md1-s1 kernel: Pid: 26257, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:53:58 fir-md1-s1 kernel: Call Trace: Aug 27 12:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:53:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:53:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:53:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935637.26257 Aug 27 12:54:06 fir-md1-s1 kernel: LustreError: 97665:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566935555, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f209d990240/0x5d9ee6e65ad43490 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65ad43497 expref: -99 pid: 97665 timeout: 0 lvb_type: 0 Aug 27 12:54:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b0f95b8-bc58-6a77-0e21-a3225e91db7a (at 10.9.103.1@o2ib4) Aug 27 12:54:32 fir-md1-s1 kernel: Lustre: Skipped 35318 previous similar messages Aug 27 12:54:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:54:43 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Aug 27 12:54:45 fir-md1-s1 kernel: Pid: 10364, comm: mdt03_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:54:45 fir-md1-s1 kernel: Call Trace: Aug 27 12:54:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:54:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:54:45 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:54:45 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:54:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:54:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:54:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:54:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:54:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:54:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935685.10364 Aug 27 12:54:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3ac23e7c-2046-1580-e1d9-0544ac26daff (at 10.9.109.34@o2ib4) reconnecting Aug 27 12:54:47 fir-md1-s1 kernel: Lustre: Skipped 34578 previous similar messages Aug 27 12:55:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.102.25@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f445da486c0/0x5d9ee6e65ac1f74a lrc: 3/0,0 mode: PR/PR res: [0x2c002cd67:0x5:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x60000400010020 nid: 10.9.102.25@o2ib4 remote: 0xbbc378ede79b4f98 expref: 57 pid: 23639 timeout: 6050763 lvb_type: 0 Aug 27 12:55:04 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 80 previous similar messages Aug 27 12:55:19 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2249s); client may timeout. req@ffff8f271b606900 x1642614182171136/t0(0) o103->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:19/0 lens 328/0 e 0 to 0 dl 1566933469 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 12:55:19 fir-md1-s1 kernel: Lustre: 21381:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5599367 previous similar messages Aug 27 12:55:22 fir-md1-s1 kernel: Lustre: 21462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (0/-10), not sending early reply req@ffff8f429bccdc50 x1633732205034416/t0(0) o103->7b7f2a08-e532-c3e9-fc40-4f2f2aa57c7c@10.9.105.23@o2ib4:22/0 lens 328/0 e 0 to 0 dl 1566935722 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:55:22 fir-md1-s1 kernel: Lustre: 21462:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1760198 previous similar messages Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: 20373:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=27 reqQ=116054 recA=0, svcEst=20, delay=99 Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: 20373:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 37602 previous similar messages Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: 20373:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-20s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8f08f3c46300 x1642476557724800/t0(0) o103->c774eb31-dfd7-6338-06de-2e964154b0ae@10.0.10.3@o2ib7:5/0 lens 336/0 e 0 to 0 dl 1566935705 ref 2 fl New:/2/ffffffff rc 0/-1 Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: 20373:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 860891 previous similar messages Aug 27 12:55:25 fir-md1-s1 kernel: Lustre: Skipped 37623 previous similar messages Aug 27 12:55:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with f1bad2aa-6db1-dd20-85dc-e36aabd3f07a (at 10.9.103.34@o2ib4), client will retry: rc = -107 Aug 27 12:55:48 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Aug 27 12:55:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with 1a643088-ea7a-3acd-f835-98d006253e47 (at 10.8.20.19@o2ib6), client will retry: rc -107 Aug 27 12:55:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Aug 27 12:55:52 fir-md1-s1 kernel: LNet: Service thread pid 26258 was inactive for 200.69s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 12:55:52 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 27 12:55:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935752.26258 Aug 27 12:55:58 fir-md1-s1 kernel: Lustre: 20218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566935737/real 1566935737] req@ffff8f16bb44f500 x1636782180543056/t0(0) o103->fir-MDT0000-lwp-MDT0002@0@lo:17/18 lens 328/224 e 1 to 1 dl 1566935758 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Aug 27 12:55:58 fir-md1-s1 kernel: Lustre: 20218:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3274 previous similar messages Aug 27 12:56:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Aug 27 12:56:37 fir-md1-s1 kernel: Lustre: Skipped 189 previous similar messages Aug 27 12:56:56 fir-md1-s1 kernel: LustreError: 97638:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 312096: error -110 Aug 27 12:57:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 12:58:05 fir-md1-s1 kernel: LustreError: 21127:0:(ldlm_lockd.c:1285:ldlm_handle_enqueue0()) ### lock on disconnected export ffff8f299c923c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2fac7e8000/0x5d9ee6e65afa34d0 lrc: 2/0,0 mode: --/CR res: [0x2c002cc2e:0x9e0b:0x0].0x0 bits 0x0/0x0 rrc: 4 type: IBT flags: 0x40000000000000 nid: local remote: 0x2d1dd3491a614a80 expref: -99 pid: 21127 timeout: 0 lvb_type: 0 Aug 27 12:58:25 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Aug 27 12:58:25 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Aug 27 12:58:34 fir-md1-s1 kernel: LNet: Service thread pid 23567 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 12:58:34 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Aug 27 12:58:35 fir-md1-s1 kernel: Pid: 23567, comm: mdt00_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:58:35 fir-md1-s1 kernel: Call Trace: Aug 27 12:58:35 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:58:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:58:35 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:58:35 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:58:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:58:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:58:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:58:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:58:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:58:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935914.23567 Aug 27 12:58:43 fir-md1-s1 kernel: Pid: 21673, comm: mdt00_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:58:43 fir-md1-s1 kernel: Call Trace: Aug 27 12:58:43 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:58:43 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:58:43 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:58:43 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:58:43 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:58:43 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:58:43 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:58:43 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:58:43 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:58:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935922.21673 Aug 27 12:58:58 fir-md1-s1 kernel: Pid: 24585, comm: mdt01_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:58:58 fir-md1-s1 kernel: Call Trace: Aug 27 12:58:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x890 [ptlrpc] Aug 27 12:58:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Aug 27 12:58:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:58:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:58:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:58:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:58:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:58:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:58:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935938.24585 Aug 27 12:58:59 fir-md1-s1 kernel: Pid: 10583, comm: mdt03_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:58:59 fir-md1-s1 kernel: Call Trace: Aug 27 12:58:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:58:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:58:59 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:58:59 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:58:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:58:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:58:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:58:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:58:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:59:27 fir-md1-s1 kernel: Pid: 97639, comm: mdt01_078 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 12:59:27 fir-md1-s1 kernel: Call Trace: Aug 27 12:59:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 12:59:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 12:59:27 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 12:59:27 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 12:59:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 12:59:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 12:59:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 12:59:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 12:59:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 12:59:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566935964.97639 Aug 27 12:59:57 fir-md1-s1 kernel: LustreError: 20545:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 275482: error -110 Aug 27 13:00:06 fir-md1-s1 kernel: LustreError: 23681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566935905, 93s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2423134140/0x5d9ee6e65afc1f24 lrc: 3/0,1 mode: --/PW res: [0x200029c0f:0x5c9:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23681 timeout: 0 lvb_type: 0 Aug 27 13:00:06 fir-md1-s1 kernel: LustreError: 23681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 99 previous similar messages Aug 27 13:00:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 4 seconds Aug 27 13:00:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 2 previous similar messages Aug 27 13:00:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (10): c: 5, oc: 0, rc: 8 Aug 27 13:00:06 fir-md1-s1 kernel: LNetError: 20180:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 2 previous similar messages Aug 27 13:00:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 27 seconds Aug 27 13:00:06 fir-md1-s1 kernel: LNet: 20180:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 63 previous similar messages Aug 27 13:00:15 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#17 stuck for 24s! [ldlm_cn01_030:25086] Aug 27 13:00:15 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses enclosure dm_multipath ipmi_si pcspkr dm_mod ipmi_devintf k10temp ccp sg i2c_piix4 ipmi_msghandler acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif Aug 27 13:00:15 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core(OE) libahci mlxfw(OE) crct10dif_pclmul devlink crct10dif_common tg3 mlx_compat(OE) drm ptp libata megaraid_sas crc32c_intel drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas Aug 27 13:00:15 fir-md1-s1 kernel: CPU: 17 PID: 25086 Comm: ldlm_cn01_030 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 Aug 27 13:00:15 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 Aug 27 13:00:15 fir-md1-s1 kernel: task: ffff8f101cd31040 ti: ffff8f101d8d0000 task.ti: ffff8f101d8d0000 Aug 27 13:00:15 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Aug 27 13:00:15 fir-md1-s1 kernel: RSP: 0018:ffff8f101d8d3da8 EFLAGS: 00000246 Aug 27 13:00:15 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff8f101d8d3d30 RCX: 0000000000890000 Aug 27 13:00:15 fir-md1-s1 kernel: RDX: ffff8f253f8db780 RSI: 0000000001690001 RDI: ffff8f12e5932288 Aug 27 13:00:15 fir-md1-s1 kernel: RBP: ffff8f101d8d3da8 R08: ffff8f253f71b780 R09: 0000000000000000 Aug 27 13:00:15 fir-md1-s1 kernel: R10: 00000000004c4b40 R11: 0000000000000011 R12: ffffffffc0dc66c0 Aug 27 13:00:15 fir-md1-s1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000015000000000 Aug 27 13:00:15 fir-md1-s1 kernel: FS: 00007f17a30c1700(0000) GS:ffff8f253f700000(0000) knlGS:0000000000000000 Aug 27 13:00:15 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 27 13:00:15 fir-md1-s1 kernel: CR2: 00007f31e7faa000 CR3: 000000103c36c000 CR4: 00000000003407e0 Aug 27 13:00:15 fir-md1-s1 kernel: Call Trace: Aug 27 13:00:15 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Aug 27 13:00:15 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Aug 27 13:00:15 fir-md1-s1 kernel: [] ptlrpc_server_hpreq_fini+0x68/0x170 [ptlrpc] Aug 27 13:00:15 fir-md1-s1 kernel: [] ptlrpc_main+0xdc0/0x1fc0 [ptlrpc] Aug 27 13:00:15 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Aug 27 13:00:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:00:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 13:00:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:00:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Aug 27 13:00:15 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 85 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Aug 27 13:00:17 fir-md1-s1 kernel: LustreError: 20374:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.101.56@o2ib4: deadline 30:32s ago req@ffff8f35a0405d00 x1642341108626256/t0(0) o102->f35471dc-4c42-bd06-27d8-a92f6bb41fe4@10.9.101.56@o2ib4:15/0 lens 328/0 e 0 to 0 dl 1566935985 ref 1 fl Interpret:H/2/ffffffff rc 0/-1 Aug 27 13:00:17 fir-md1-s1 kernel: LustreError: 20374:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 153282 previous similar messages Aug 27 13:00:18 fir-md1-s1 kernel: LNetError: 55551:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.18.6@o2ib6 from 10.0.10.51@o2ib7 Aug 27 13:00:18 fir-md1-s1 kernel: LNetError: 55551:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 67 previous similar messages Aug 27 13:00:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936031.23454 Aug 27 13:00:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936043.21333 Aug 27 13:01:28 fir-md1-s1 kernel: LustreError: 21452:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 28622: error -110 Aug 27 13:01:34 fir-md1-s1 kernel: LustreError: 36718:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8f0f9716c500 x1636782186322992/t0(0) o105->fir-MDT0002@10.8.26.2@o2ib6:15/16 lens 304/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Aug 27 13:01:34 fir-md1-s1 kernel: LustreError: 36718:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Aug 27 13:01:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936105.97667 Aug 27 13:02:00 fir-md1-s1 kernel: LustreError: 20553:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f0942ec7c00 ns: mdt-fir-MDT0000_UUID lock: ffff8f3b7cf47080/0x5d9ee6e65b15e457 lrc: 3/0,0 mode: CR/CR res: [0x20002a426:0x1:0x0].0x0 bits 0x9/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.109.55@o2ib4 remote: 0xaa7b415a7caf993c expref: 6 pid: 20553 timeout: 0 lvb_type: 0 Aug 27 13:02:00 fir-md1-s1 kernel: LustreError: 20553:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 43 previous similar messages Aug 27 13:02:38 fir-md1-s1 kernel: LNet: Service thread pid 23666 completed after 6310.89s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 13:02:38 fir-md1-s1 kernel: LNet: Skipped 51 previous similar messages Aug 27 13:02:40 fir-md1-s1 kernel: LustreError: 71897:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk READ after 23+1s req@ffff8f39f528a100 x1636435332972192/t0(0) o37->fc9cbc0d-41e6-18a0-ddfe-91c390cc7652@10.9.108.7@o2ib4:9/0 lens 448/440 e 0 to 0 dl 1566936159 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 13:02:40 fir-md1-s1 kernel: LustreError: 71897:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 3 previous similar messages Aug 27 13:02:58 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Aug 27 13:02:58 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 8 previous similar messages Aug 27 13:03:05 fir-md1-s1 kernel: LustreError: 22280:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566936095, 90s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8f22a78b57c0/0x5d9ee6e65b0e3064 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x5d9ee6e65b0e3087 expref: -99 pid: 22280 timeout: 0 lvb_type: 0 Aug 27 13:03:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d32c9b39-44a5-66ef-3dc3-72b5663de669 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1dfef2e000, cur 1566936195 expire 1566936045 last 1566935968 Aug 27 13:03:15 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Aug 27 13:03:39 fir-md1-s1 kernel: LustreError: 31005:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.70@o2ib4 arrived at 1566936219 with bad export cookie 6746083168864022596 Aug 27 13:03:39 fir-md1-s1 kernel: LustreError: 31005:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1672 previous similar messages Aug 27 13:03:51 fir-md1-s1 kernel: LustreError: 20955:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f4128b6b900 x1631565412813712/t0(0) o37->f7d39296-2681-999e-c9dd-38a3ef8bf584@10.9.106.15@o2ib4:20/0 lens 448/440 e 0 to 0 dl 1566936260 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 13:03:51 fir-md1-s1 kernel: LustreError: 20955:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 33 previous similar messages Aug 27 13:03:52 fir-md1-s1 kernel: LustreError: 20192:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f44ef314a00 Aug 27 13:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f174f128-4488-2485-c92d-799c5cc7f49d (at 10.9.104.27@o2ib4) Aug 27 13:04:31 fir-md1-s1 kernel: Lustre: Skipped 29886 previous similar messages Aug 27 13:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8ec1acae-5541-1224-6330-34435f948ba9 (at 10.9.106.61@o2ib4) reconnecting Aug 27 13:04:48 fir-md1-s1 kernel: Lustre: Skipped 26940 previous similar messages Aug 27 13:06:44 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Tue Aug 27 13:06:44 2019 Aug 27 13:06:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.113.4@o2ib4, removing former export from same NID Aug 27 13:06:47 fir-md1-s1 kernel: Lustre: Skipped 1694 previous similar messages Aug 27 13:08:29 fir-md1-s1 kernel: Lustre: 21333:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26893c7500 x1643047332149216/t0(0) o101->01220ca0-c29f-4cb8-bddb-c495482aa608@10.9.0.61@o2ib4:4/0 lens 584/3264 e 0 to 0 dl 1566936514 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 13:08:29 fir-md1-s1 kernel: Lustre: 21333:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2280177 previous similar messages Aug 27 13:10:04 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566936514, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2d9e083180/0x5d9ee6e65c38e54b lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21452 timeout: 0 lvb_type: 0 Aug 27 13:10:04 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 24 previous similar messages Aug 27 13:11:25 fir-md1-s1 kernel: LNet: Service thread pid 23755 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 13:11:25 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 13:11:25 fir-md1-s1 kernel: Pid: 23755, comm: mdt02_105 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 13:11:25 fir-md1-s1 kernel: Call Trace: Aug 27 13:11:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 13:11:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 13:11:25 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 13:11:25 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 13:11:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 13:11:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 13:11:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:11:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:11:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 13:11:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936685.23755 Aug 27 13:11:54 fir-md1-s1 kernel: Pid: 21452, comm: mdt02_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 13:11:54 fir-md1-s1 kernel: Call Trace: Aug 27 13:11:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 13:11:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 13:11:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 13:11:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 13:11:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 13:11:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 13:11:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:11:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:11:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 13:11:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936714.21452 Aug 27 13:12:24 fir-md1-s1 kernel: Pid: 24578, comm: mdt01_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 13:12:24 fir-md1-s1 kernel: Call Trace: Aug 27 13:12:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 13:12:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 13:12:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 13:12:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 13:12:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 13:12:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 13:12:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:12:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:12:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 13:12:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566936744.24578 Aug 27 13:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 13:14:33 fir-md1-s1 kernel: Lustre: Skipped 546 previous similar messages Aug 27 13:14:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1bad2aa-6db1-dd20-85dc-e36aabd3f07a (at 10.9.103.34@o2ib4) reconnecting Aug 27 13:14:48 fir-md1-s1 kernel: Lustre: Skipped 536 previous similar messages Aug 27 13:17:07 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Tue Aug 27 13:17:07 2019 Aug 27 13:19:57 fir-md1-s1 kernel: LNet: Service thread pid 20457 completed after 7765.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 13:19:57 fir-md1-s1 kernel: LustreError: 23690:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f305f5dfc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f369a477980/0x5d9ee6e657d2a99f lrc: 3/0,0 mode: PW/PW res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x40/0x0 rrc: 25 type: IBT flags: 0x50200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897fd9761b3a expref: 6 pid: 23690 timeout: 0 lvb_type: 0 Aug 27 13:19:57 fir-md1-s1 kernel: LustreError: 23690:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 16 previous similar messages Aug 27 13:19:57 fir-md1-s1 kernel: Lustre: 23690:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6574:1023s); client may timeout. req@ffff8f43a151f800 x1631586669457248/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:20/0 lens 480/536 e 0 to 0 dl 1566936174 ref 1 fl Complete:/0/0 rc -107/-107 Aug 27 13:19:57 fir-md1-s1 kernel: Lustre: 23690:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 19099373 previous similar messages Aug 27 13:19:57 fir-md1-s1 kernel: LNet: Skipped 49 previous similar messages Aug 27 13:20:32 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-15), not sending early reply req@ffff8f4289d4f500 x1631608571706848/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1566937237 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 13:20:32 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 27 13:21:27 fir-md1-s1 kernel: LustreError: 20554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566937197, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0f677c6c00/0x5d9ee6e6631b95bb lrc: 3/0,1 mode: --/PW res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x40/0x0 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20554 timeout: 0 lvb_type: 0 Aug 27 13:21:27 fir-md1-s1 kernel: LustreError: 20554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 27 13:21:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 4f0e2a9a-5e0c-e83b-3e5a-386d20c11435 (at 10.9.102.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f25303b7400, cur 1566937289 expire 1566937139 last 1566937062 Aug 27 13:22:44 fir-md1-s1 kernel: Lustre: 10151:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (95:6781s); client may timeout. req@ffff8f27cdd34b00 x1631571816709376/t0(0) o101->4b6e4105-ad27-7331-49d4-b54bb82f1685@10.9.105.21@o2ib4:11/0 lens 480/536 e 0 to 0 dl 1566930583 ref 1 fl Complete:/0/0 rc -107/-107 Aug 27 13:23:17 fir-md1-s1 kernel: LNet: Service thread pid 20554 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 13:23:17 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Aug 27 13:23:17 fir-md1-s1 kernel: Pid: 20554, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 13:23:17 fir-md1-s1 kernel: Call Trace: Aug 27 13:23:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 13:23:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 13:23:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:23:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:23:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 13:23:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566937397.20554 Aug 27 13:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e51acf95-3440-c728-d0af-203c8ae1e157 (at 10.9.103.34@o2ib4) Aug 27 13:24:37 fir-md1-s1 kernel: Lustre: Skipped 523 previous similar messages Aug 27 13:24:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 27 13:24:49 fir-md1-s1 kernel: Lustre: Skipped 510 previous similar messages Aug 27 13:34:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 27 13:34:38 fir-md1-s1 kernel: Lustre: Skipped 505 previous similar messages Aug 27 13:34:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d31107d5-0348-6f95-7970-9bba1ab39904 (at 10.9.102.24@o2ib4) reconnecting Aug 27 13:34:49 fir-md1-s1 kernel: Lustre: Skipped 502 previous similar messages Aug 27 13:36:51 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Tue Aug 27 13:36:51 2019 Aug 27 13:44:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1062a832-c778-75e1-da43-5a08be7649ee (at 10.9.102.38@o2ib4) Aug 27 13:44:38 fir-md1-s1 kernel: Lustre: Skipped 505 previous similar messages Aug 27 13:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8a37f7b1-3efc-30e9-f8d1-739df6680357 (at 10.9.104.19@o2ib4) reconnecting Aug 27 13:44:50 fir-md1-s1 kernel: Lustre: Skipped 503 previous similar messages Aug 27 13:54:17 fir-md1-s1 kernel: Lustre: 23554:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:54:36 fir-md1-s1 kernel: Lustre: 25678:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:54:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 25452b8c-89d8-4861-e6b1-0b6a1535cde3 (at 10.9.104.19@o2ib4) Aug 27 13:54:39 fir-md1-s1 kernel: Lustre: Skipped 506 previous similar messages Aug 27 13:54:41 fir-md1-s1 kernel: Lustre: 23653:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:54:41 fir-md1-s1 kernel: Lustre: 23653:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Aug 27 13:54:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 13:54:51 fir-md1-s1 kernel: Lustre: Skipped 503 previous similar messages Aug 27 13:54:53 fir-md1-s1 kernel: Lustre: 23574:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:55:01 fir-md1-s1 kernel: Lustre: 23577:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:55:25 fir-md1-s1 kernel: Lustre: 23561:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Aug 27 13:56:10 fir-md1-s1 kernel: Lustre: 23605:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e5d385d00 x1643052940943472/t0(0) o101->f37a46e0-1e70-6b27-1459-0c7be76fae27@10.0.10.3@o2ib7:15/0 lens 576/3264 e 1 to 0 dl 1566939375 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 13:57:25 fir-md1-s1 kernel: LustreError: 23554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566939355, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2e029357c0/0x5d9ee6e677e4ed55 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x12/0x0 rrc: 14 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23554 timeout: 0 lvb_type: 0 Aug 27 13:57:33 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.115.3@o2ib4, removing former export from same NID Aug 27 13:57:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 13:59:15 fir-md1-s1 kernel: LNet: Service thread pid 23554 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 13:59:15 fir-md1-s1 kernel: Pid: 23554, comm: mdt00_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 13:59:15 fir-md1-s1 kernel: Call Trace: Aug 27 13:59:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 13:59:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 13:59:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 13:59:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 13:59:15 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Aug 27 13:59:15 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 13:59:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 13:59:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 13:59:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 13:59:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 13:59:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 13:59:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 13:59:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 13:59:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 13:59:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 13:59:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 13:59:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566939556.23554 Aug 27 14:04:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 14:04:40 fir-md1-s1 kernel: Lustre: Skipped 533 previous similar messages Aug 27 14:04:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1bad2aa-6db1-dd20-85dc-e36aabd3f07a (at 10.9.103.34@o2ib4) reconnecting Aug 27 14:04:55 fir-md1-s1 kernel: Lustre: Skipped 534 previous similar messages Aug 27 14:08:11 fir-md1-s1 kernel: LustreError: 20468:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f305f5dfc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f369a474a40/0x5d9ee6e657d2a9ad lrc: 3/0,0 mode: PW/PW res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x50200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897fd9761b33 expref: 4 pid: 20468 timeout: 0 lvb_type: 0 Aug 27 14:08:11 fir-md1-s1 kernel: Lustre: 23656:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (904:9339s); client may timeout. req@ffff8f2cd6131200 x1634188304182896/t0(0) o101->95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b@10.9.109.57@o2ib4:28/0 lens 480/536 e 0 to 0 dl 1566930752 ref 1 fl Complete:/0/0 rc -107/-107 Aug 27 14:08:11 fir-md1-s1 kernel: LNet: Service thread pid 23656 completed after 10243.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 14:08:11 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 27 14:08:11 fir-md1-s1 kernel: LustreError: 20468:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Aug 27 14:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e51acf95-3440-c728-d0af-203c8ae1e157 (at 10.9.103.34@o2ib4) Aug 27 14:14:44 fir-md1-s1 kernel: Lustre: Skipped 539 previous similar messages Aug 27 14:14:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 27 14:14:56 fir-md1-s1 kernel: Lustre: Skipped 531 previous similar messages Aug 27 14:24:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 27 14:24:45 fir-md1-s1 kernel: Lustre: Skipped 531 previous similar messages Aug 27 14:24:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d31107d5-0348-6f95-7970-9bba1ab39904 (at 10.9.102.24@o2ib4) reconnecting Aug 27 14:24:56 fir-md1-s1 kernel: Lustre: Skipped 529 previous similar messages Aug 27 14:25:03 fir-md1-s1 kernel: LNet: Service thread pid 97666 completed after 11500.19s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 27 14:25:03 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 14:25:03 fir-md1-s1 kernel: LustreError: 21410:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f305f5dfc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f1284da4380/0x5d9ee6e658006be0 lrc: 3/0,0 mode: PW/PW res: [0x2c002c013:0xa64a:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x50200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897fd9761d9b expref: 2 pid: 21410 timeout: 0 lvb_type: 0 Aug 27 14:25:03 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6391:4929s); client may timeout. req@ffff8f1b48445d00 x1631586669485696/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:26/0 lens 480/536 e 0 to 0 dl 1566936174 ref 1 fl Complete:/0/0 rc -107/-107 Aug 27 14:25:03 fir-md1-s1 kernel: Lustre: 21410:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 27 14:33:38 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2c269ac200 x1642705087564992/t0(0) o101->914b63c8-3a12-8009-32f3-deaae1cd82be@10.8.0.68@o2ib6:13/0 lens 584/3264 e 0 to 0 dl 1566941623 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 14:34:10 fir-md1-s1 kernel: Lustre: 23740:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-7), not sending early reply req@ffff8f259015f500 x1642705087937232/t0(0) o101->914b63c8-3a12-8009-32f3-deaae1cd82be@10.8.0.68@o2ib6:15/0 lens 584/3264 e 1 to 0 dl 1566941655 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 14:34:43 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566941593, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f28934e18c0/0x5d9ee6e686d8d27d lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 30 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23645 timeout: 0 lvb_type: 0 Aug 27 14:34:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1062a832-c778-75e1-da43-5a08be7649ee (at 10.9.102.38@o2ib4) Aug 27 14:34:45 fir-md1-s1 kernel: Lustre: Skipped 458 previous similar messages Aug 27 14:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 357ed5e6-797d-063b-772c-730368f05495 (at 10.9.103.26@o2ib4) reconnecting Aug 27 14:34:58 fir-md1-s1 kernel: Lustre: Skipped 459 previous similar messages Aug 27 14:35:13 fir-md1-s1 kernel: LustreError: 50442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566941623, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ee5499f80/0x5d9ee6e686e6af45 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 30 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50442 timeout: 0 lvb_type: 0 Aug 27 14:35:56 fir-md1-s1 kernel: Lustre: 97640:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f20dd46bc00 x1636427141169616/t0(0) o101->f29e1e71-511a-3e98-949d-3f54561359cc@10.9.101.58@o2ib4:1/0 lens 592/3264 e 0 to 0 dl 1566941761 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 14:36:34 fir-md1-s1 kernel: LNet: Service thread pid 23645 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 14:36:34 fir-md1-s1 kernel: Pid: 23645, comm: mdt02_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 14:36:34 fir-md1-s1 kernel: Call Trace: Aug 27 14:36:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 14:36:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 14:36:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 14:36:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 14:36:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 14:36:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 14:36:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 14:36:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 14:36:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 14:36:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566941794.23645 Aug 27 14:37:01 fir-md1-s1 kernel: LustreError: 20460:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566941731, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1f55906c00/0x5d9ee6e6876173a8 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20460 timeout: 0 lvb_type: 0 Aug 27 14:37:04 fir-md1-s1 kernel: LNet: Service thread pid 50442 was inactive for 200.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 14:37:04 fir-md1-s1 kernel: Pid: 50442, comm: mdt02_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 14:37:04 fir-md1-s1 kernel: Call Trace: Aug 27 14:37:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 14:37:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 14:37:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 14:37:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 14:37:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 14:37:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 14:37:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 14:37:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 14:37:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 14:37:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566941824.50442 Aug 27 14:38:51 fir-md1-s1 kernel: LNet: Service thread pid 20460 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 14:38:51 fir-md1-s1 kernel: Pid: 20460, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 14:38:51 fir-md1-s1 kernel: Call Trace: Aug 27 14:38:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 14:38:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 14:38:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 14:38:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 14:38:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 14:38:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 14:38:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 14:38:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 14:38:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 14:38:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566941931.20460 Aug 27 14:44:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 14:44:46 fir-md1-s1 kernel: Lustre: Skipped 500 previous similar messages Aug 27 14:44:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 14:44:59 fir-md1-s1 kernel: Lustre: Skipped 500 previous similar messages Aug 27 14:51:03 fir-md1-s1 kernel: Lustre: 23700:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f27fe04b600 x1642705092397056/t0(0) o101->914b63c8-3a12-8009-32f3-deaae1cd82be@10.8.0.68@o2ib6:8/0 lens 592/3264 e 1 to 0 dl 1566942668 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 14:52:18 fir-md1-s1 kernel: LustreError: 10144:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566942648, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f289473a400/0x5d9ee6e68a3e378b lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10144 timeout: 0 lvb_type: 0 Aug 27 14:54:09 fir-md1-s1 kernel: LNet: Service thread pid 10144 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 14:54:09 fir-md1-s1 kernel: Pid: 10144, comm: mdt02_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 14:54:09 fir-md1-s1 kernel: Call Trace: Aug 27 14:54:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 14:54:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 14:54:09 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 14:54:09 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 14:54:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 14:54:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 14:54:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 14:54:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 14:54:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 14:54:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566942849.10144 Aug 27 14:54:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 14:54:48 fir-md1-s1 kernel: Lustre: Skipped 519 previous similar messages Aug 27 14:55:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 14:55:00 fir-md1-s1 kernel: Lustre: Skipped 517 previous similar messages Aug 27 15:04:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 15:04:48 fir-md1-s1 kernel: Lustre: Skipped 537 previous similar messages Aug 27 15:05:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1f080c09-2fe8-8c89-493e-0f353450ad44 (at 10.9.102.20@o2ib4) reconnecting Aug 27 15:05:01 fir-md1-s1 kernel: Lustre: Skipped 528 previous similar messages Aug 27 15:14:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.102.20@o2ib4) Aug 27 15:14:50 fir-md1-s1 kernel: Lustre: Skipped 528 previous similar messages Aug 27 15:15:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 27 15:15:02 fir-md1-s1 kernel: Lustre: Skipped 530 previous similar messages Aug 27 15:24:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5a6d64b7-8368-162d-98b4-716457bd6d0c (at 10.9.102.19@o2ib4) Aug 27 15:24:52 fir-md1-s1 kernel: Lustre: Skipped 531 previous similar messages Aug 27 15:25:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 357ed5e6-797d-063b-772c-730368f05495 (at 10.9.103.26@o2ib4) reconnecting Aug 27 15:25:05 fir-md1-s1 kernel: Lustre: Skipped 533 previous similar messages Aug 27 15:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 97e1f265-2602-c471-9ba2-e911e8b1f2ac (at 10.8.6.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f296c754800, cur 1566944853 expire 1566944703 last 1566944626 Aug 27 15:27:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 15:34:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 15:34:54 fir-md1-s1 kernel: Lustre: Skipped 535 previous similar messages Aug 27 15:35:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 15:35:06 fir-md1-s1 kernel: Lustre: Skipped 530 previous similar messages Aug 27 15:38:35 fir-md1-s1 kernel: Lustre: 10146:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f278aa0aa00 x1641916035374208/t0(0) o101->d96d2d4a-213c-de28-afa6-2cb1bee603bd@10.8.17.22@o2ib6:10/0 lens 592/3264 e 1 to 0 dl 1566945520 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 15:38:48 fir-md1-s1 kernel: Lustre: 23704:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f2c55c55d00 x1641928476205344/t0(0) o101->af04ed26-0e4b-db45-3414-20245014a46d@10.8.27.34@o2ib6:23/0 lens 592/3264 e 1 to 0 dl 1566945533 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 15:38:58 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f18aef10600 x1637903338737264/t0(0) o101->aa3ee41d-cac0-6749-5220-bb62e9eebc36@10.8.28.5@o2ib6:3/0 lens 592/3264 e 0 to 0 dl 1566945543 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 15:38:58 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 27 15:39:50 fir-md1-s1 kernel: LustreError: 23700:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566945500, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2e43c6f740/0x5d9ee6e68e38bd65 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23700 timeout: 0 lvb_type: 0 Aug 27 15:40:03 fir-md1-s1 kernel: LustreError: 23616:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566945513, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2e91810900/0x5d9ee6e68e3ba541 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23616 timeout: 0 lvb_type: 0 Aug 27 15:41:40 fir-md1-s1 kernel: LNet: Service thread pid 23700 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 15:41:40 fir-md1-s1 kernel: Pid: 23700, comm: mdt02_078 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:41:40 fir-md1-s1 kernel: Call Trace: Aug 27 15:41:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:41:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:41:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:41:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:41:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:41:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:41:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:41:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:41:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:41:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566945700.23700 Aug 27 15:41:54 fir-md1-s1 kernel: LNet: Service thread pid 23616 was inactive for 200.59s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 15:41:54 fir-md1-s1 kernel: Pid: 23616, comm: mdt02_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:41:54 fir-md1-s1 kernel: Call Trace: Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:41:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:41:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:41:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566945714.23616 Aug 27 15:41:54 fir-md1-s1 kernel: Pid: 20724, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:41:54 fir-md1-s1 kernel: Call Trace: Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:41:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:41:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:41:54 fir-md1-s1 kernel: Pid: 50448, comm: mdt01_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:41:54 fir-md1-s1 kernel: Call Trace: Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:41:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:41:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:41:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:41:55 fir-md1-s1 kernel: Pid: 97646, comm: mdt01_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:41:55 fir-md1-s1 kernel: Call Trace: Aug 27 15:41:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:41:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:41:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:41:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:41:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:41:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:41:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:41:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:41:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:41:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566945715.97646 Aug 27 15:41:55 fir-md1-s1 kernel: LNet: Service thread pid 23652 was inactive for 200.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Aug 27 15:41:55 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 27 15:44:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 15:44:55 fir-md1-s1 kernel: Lustre: Skipped 638 previous similar messages Aug 27 15:45:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 27 15:45:08 fir-md1-s1 kernel: Lustre: Skipped 644 previous similar messages Aug 27 15:54:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 27 15:54:56 fir-md1-s1 kernel: Lustre: Skipped 705 previous similar messages Aug 27 15:55:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1f080c09-2fe8-8c89-493e-0f353450ad44 (at 10.9.102.20@o2ib4) reconnecting Aug 27 15:55:08 fir-md1-s1 kernel: Lustre: Skipped 704 previous similar messages Aug 27 15:55:39 fir-md1-s1 kernel: Lustre: 21672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f282d245a00 x1642588021807920/t0(0) o101->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:14/0 lens 592/3264 e 1 to 0 dl 1566946544 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 15:55:39 fir-md1-s1 kernel: Lustre: 21672:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 27 15:56:54 fir-md1-s1 kernel: LustreError: 20464:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566946524, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2ead4a1680/0x5d9ee6e691c39a3a lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20464 timeout: 0 lvb_type: 0 Aug 27 15:56:54 fir-md1-s1 kernel: LustreError: 20464:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 27 15:58:44 fir-md1-s1 kernel: LNet: Service thread pid 20464 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 15:58:44 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Aug 27 15:58:44 fir-md1-s1 kernel: Pid: 20464, comm: mdt02_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 15:58:44 fir-md1-s1 kernel: Call Trace: Aug 27 15:58:44 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 15:58:44 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 15:58:44 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 15:58:44 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 15:58:44 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 15:58:44 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 15:58:44 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 15:58:44 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 15:58:44 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 15:58:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566946724.20464 Aug 27 16:04:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.102.20@o2ib4) Aug 27 16:04:57 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 16:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client af04ed26-0e4b-db45-3414-20245014a46d (at 10.8.27.34@o2ib6) reconnecting Aug 27 16:05:10 fir-md1-s1 kernel: Lustre: Skipped 726 previous similar messages Aug 27 16:12:54 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 16:12:54 fir-md1-s1 kernel: LNetError: 20195:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 58 previous similar messages Aug 27 16:13:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.103.15@o2ib4, removing former export from same NID Aug 27 16:13:41 fir-md1-s1 kernel: Lustre: 21414:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d0244cb00 x1641151745344336/t0(0) o101->666ef9b9-c560-ec4f-20a2-4b6d1150cfc7@10.8.27.25@o2ib6:16/0 lens 576/3264 e 0 to 0 dl 1566947626 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:42 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f26ea3d2100 x1638897629149376/t0(0) o101->98dfa4cc-0720-0891-dfe6-224d79a14e18@10.8.17.11@o2ib6:17/0 lens 576/3264 e 0 to 0 dl 1566947627 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:42 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Aug 27 16:13:43 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f33301a4e00 x1638897629151040/t0(0) o101->98dfa4cc-0720-0891-dfe6-224d79a14e18@10.8.17.11@o2ib6:18/0 lens 576/3264 e 0 to 0 dl 1566947628 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:43 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 39 previous similar messages Aug 27 16:13:45 fir-md1-s1 kernel: Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1566947618/real 0] req@ffff8f22d35d5400 x1636782257606976/t0(0) o104->fir-MDT0000@10.9.102.6@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566947625 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Aug 27 16:13:45 fir-md1-s1 kernel: Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 104052 previous similar messages Aug 27 16:13:45 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2d86f25100 x1641153159457584/t0(0) o101->681b3db7-df93-2f94-76d9-435e49ae8be8@10.8.8.20@o2ib6:20/0 lens 576/3264 e 0 to 0 dl 1566947630 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:45 fir-md1-s1 kernel: Lustre: 23679:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 51 previous similar messages Aug 27 16:13:45 fir-md1-s1 kernel: LustreError: 49250:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8f065b27d050 x1631569934057824/t0(0) o4->25c05458-1ff8-5b3c-505b-360943a414ba@10.9.104.66@o2ib4:5/0 lens 488/448 e 0 to 0 dl 1566947645 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 25c05458-1ff8-5b3c-505b-360943a414ba (at 10.9.104.66@o2ib4), client will retry: rc = -110 Aug 27 16:13:45 fir-md1-s1 kernel: LustreError: 49250:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 27 16:13:49 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.47@o2ib4, removing former export from same NID Aug 27 16:13:49 fir-md1-s1 kernel: Lustre: Skipped 154 previous similar messages Aug 27 16:13:50 fir-md1-s1 kernel: Lustre: 23697:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f3e8ea7da00 x1635104276163360/t0(0) o101->d20fca7d-014d-af30-ddc5-1fb31528f1e1@10.9.109.43@o2ib4:25/0 lens 1776/3288 e 0 to 0 dl 1566947635 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:13:50 fir-md1-s1 kernel: Lustre: 23697:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Aug 27 16:14:02 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f4485bb6f00 x1634153563006128/t0(0) o101->a8495761-7359-3610-2479-b4da362523dd@10.9.101.31@o2ib4:7/0 lens 400/1600 e 0 to 0 dl 1566947647 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:14:02 fir-md1-s1 kernel: Lustre: 23736:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Aug 27 16:14:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.69@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f3034b01d40/0x5d9ee6e698b06d57 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 499 type: IBT flags: 0x60200400000020 nid: 10.9.108.69@o2ib4 remote: 0x64fc8ba6518775ca expref: 728 pid: 10151 timeout: 6062707 lvb_type: 0 Aug 27 16:14:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 80 previous similar messages Aug 27 16:14:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.68@o2ib4, removing former export from same NID Aug 27 16:14:08 fir-md1-s1 kernel: Lustre: Skipped 310 previous similar messages Aug 27 16:14:08 fir-md1-s1 kernel: Lustre: 97656:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:21s); client may timeout. req@ffff8f251b80a700 x1635099905584080/t0(0) o101->d6c95989-a33e-02cc-37c5-1e98ca81c68c@10.9.105.2@o2ib4:17/0 lens 576/536 e 0 to 0 dl 1566947627 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:14:09 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 16:14:09 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 55 previous similar messages Aug 27 16:14:09 fir-md1-s1 kernel: Lustre: 21458:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8f1cfe5ece00 x1635095246192640/t446501547093(0) o101->9234d6a3-de0f-63f4-f884-c9cfe5f61af5@10.9.102.6@o2ib4:7/0 lens 376/1040 e 0 to 0 dl 1566947647 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:14:11 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f12c36ebc00 Aug 27 16:14:14 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:6s); client may timeout. req@ffff8f22d35d0f00 x1635104889274912/t360816070937(0) o101->7b713982-0a61-76e3-fb94-afdf7658a450@10.9.105.3@o2ib4:8/0 lens 1792/1192 e 0 to 0 dl 1566947648 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:14:14 fir-md1-s1 kernel: Lustre: 24582:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 27 16:14:33 fir-md1-s1 kernel: Lustre: 21414:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f32bc63c200 x1641151745348272/t0(0) o101->666ef9b9-c560-ec4f-20a2-4b6d1150cfc7@10.8.27.25@o2ib6:8/0 lens 576/3264 e 0 to 0 dl 1566947678 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:14:33 fir-md1-s1 kernel: Lustre: 21414:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 201 previous similar messages Aug 27 16:14:37 fir-md1-s1 kernel: LustreError: 23649:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f148d233400 ns: mdt-fir-MDT0002_UUID lock: ffff8f4094927740/0x5d9ee6e698b180d4 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 490 type: IBT flags: 0x50200400000020 nid: 10.9.109.31@o2ib4 remote: 0xf2594c4398bc7f20 expref: 2 pid: 23649 timeout: 0 lvb_type: 0 Aug 27 16:14:37 fir-md1-s1 kernel: LustreError: 97655:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbeddf232650 vs. last_xid 5cbeddf29fdff req@ffff8f1a4ab73f00 x1631597394863696/t0(0) o101->9a1469ab-c675-66e9-07cc-7b69a63273a8@10.9.101.2@o2ib4:8/0 lens 328/0 e 0 to 0 dl 1566947678 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Aug 27 16:14:37 fir-md1-s1 kernel: LustreError: 97655:0:(tgt_handler.c:644:process_req_last_xid()) Skipped 1 previous similar message Aug 27 16:14:37 fir-md1-s1 kernel: Lustre: 23727:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (78:1s); client may timeout. req@ffff8f377a535a00 x1631608576461664/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:18/0 lens 576/536 e 0 to 0 dl 1566947676 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:14:38 fir-md1-s1 kernel: Lustre: 23727:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Aug 27 16:14:42 fir-md1-s1 kernel: LustreError: 20193:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f22171bac00 Aug 27 16:14:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.113.13@o2ib4, removing former export from same NID Aug 27 16:14:45 fir-md1-s1 kernel: Lustre: Skipped 305 previous similar messages Aug 27 16:14:49 fir-md1-s1 kernel: LustreError: 22287:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947598, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1e56d01440/0x5d9ee6e698b181b4 lrc: 3/0,1 mode: --/CW res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x2/0x0 rrc: 501 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22287 timeout: 0 lvb_type: 0 Aug 27 16:14:49 fir-md1-s1 kernel: LustreError: 22287:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 37 previous similar messages Aug 27 16:14:49 fir-md1-s1 kernel: LustreError: 50582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947599, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2cddf52880/0x5d9ee6e698b1ca0e lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 501 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 50582 timeout: 0 lvb_type: 0 Aug 27 16:14:49 fir-md1-s1 kernel: LustreError: 50582:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Aug 27 16:14:50 fir-md1-s1 kernel: LustreError: 10148:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947600, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2b77532400/0x5d9ee6e698b207be lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 501 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 10148 timeout: 0 lvb_type: 0 Aug 27 16:14:50 fir-md1-s1 kernel: LustreError: 10148:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Aug 27 16:14:52 fir-md1-s1 kernel: LustreError: 23751:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947602, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f34f1cdf2c0/0x5d9ee6e698b25cf2 lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 502 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23751 timeout: 0 lvb_type: 0 Aug 27 16:14:52 fir-md1-s1 kernel: LustreError: 23751:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages Aug 27 16:14:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ddc05bda-b940-b399-52f0-62d512a4550b (at 10.8.6.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f0d0804d000, cur 1566947694 expire 1566947544 last 1566947467 Aug 27 16:14:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 16:14:56 fir-md1-s1 kernel: LustreError: 22007:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947606, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f16550b4c80/0x5d9ee6e698b3583c lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 503 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22007 timeout: 0 lvb_type: 0 Aug 27 16:14:56 fir-md1-s1 kernel: LustreError: 22007:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 25 previous similar messages Aug 27 16:14:58 fir-md1-s1 kernel: Lustre: 21415:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (79:20s); client may timeout. req@ffff8f2749df6f00 x1631548179283504/t0(0) o101->f5f74966-59a2-6619-dc33-28e321e9f975@10.9.108.31@o2ib4:18/0 lens 576/536 e 0 to 0 dl 1566947677 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3dfa6e70-63bf-e5c0-ffad-f19ce427da6f (at 10.9.108.69@o2ib4) Aug 27 16:14:58 fir-md1-s1 kernel: Lustre: Skipped 1899 previous similar messages Aug 27 16:15:06 fir-md1-s1 kernel: Lustre: 23584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2bf2183000 x1635208838533440/t0(0) o101->04874f63-dfd7-2a1b-9b5b-da39adcf93d5@10.9.109.42@o2ib4:11/0 lens 576/3264 e 0 to 0 dl 1566947711 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:15:06 fir-md1-s1 kernel: Lustre: 23584:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 209 previous similar messages Aug 27 16:15:07 fir-md1-s1 kernel: LustreError: 21892:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f23a8d36300 x1642588057746832/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:4/0 lens 448/440 e 0 to 0 dl 1566947734 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:15:07 fir-md1-s1 kernel: LustreError: 21892:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Aug 27 16:15:07 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f24ee5f7200 Aug 27 16:15:07 fir-md1-s1 kernel: LustreError: 20729:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947617, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f24ce583f00/0x5d9ee6e698b3dc55 lrc: 3/1,0 mode: --/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 505 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20729 timeout: 0 lvb_type: 0 Aug 27 16:15:07 fir-md1-s1 kernel: LustreError: 20729:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages Aug 27 16:15:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b05916b7-111a-93c6-b801-46b72244d611 (at 10.9.108.68@o2ib4) reconnecting Aug 27 16:15:10 fir-md1-s1 kernel: Lustre: Skipped 1014 previous similar messages Aug 27 16:15:17 fir-md1-s1 kernel: LustreError: 23683:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f250d8dfc00 ns: mdt-fir-MDT0002_UUID lock: ffff8f40ffb257c0/0x5d9ee6e698b35189 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 493 type: IBT flags: 0x50200400000020 nid: 10.9.108.69@o2ib4 remote: 0x64fc8ba651877680 expref: 2 pid: 23683 timeout: 0 lvb_type: 0 Aug 27 16:15:17 fir-md1-s1 kernel: Lustre: 23683:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:81s); client may timeout. req@ffff8f088cabe300 x1636458047781344/t0(0) o101->70888cbb-e6cc-9516-9888-6377df5e01da@10.9.108.69@o2ib4:26/0 lens 576/536 e 0 to 0 dl 1566947636 ref 1 fl Complete:/0/0 rc -107/-107 Aug 27 16:15:48 fir-md1-s1 kernel: LustreError: 23733:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947658, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f27f5526540/0x5d9ee6e698b569b1 lrc: 3/0,1 mode: --/CW res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x2/0x0 rrc: 531 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23733 timeout: 0 lvb_type: 0 Aug 27 16:15:48 fir-md1-s1 kernel: LustreError: 23733:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Aug 27 16:15:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.107.40@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f295a39ad00/0x5d9ee6e698b54060 lrc: 3/0,0 mode: PR/PR res: [0x2c0014fbb:0x115fc:0x0].0x0 bits 0x13/0x0 rrc: 533 type: IBT flags: 0x60200400000020 nid: 10.9.107.40@o2ib4 remote: 0x420aebecb0d4021a expref: 724 pid: 10309 timeout: 6062812 lvb_type: 0 Aug 27 16:15:52 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Aug 27 16:15:53 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (83:5s); client may timeout. req@ffff8f26150aa700 x1639156453124688/t0(0) o101->c494cabb-e59d-df60-ca54-c4f84b0133be@10.9.108.28@o2ib4:24/0 lens 576/536 e 0 to 0 dl 1566947747 ref 1 fl Complete:/0/0 rc 0/0 Aug 27 16:15:53 fir-md1-s1 kernel: Lustre: 23750:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 27 16:16:01 fir-md1-s1 kernel: LustreError: 20184:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f0ddde73400 Aug 27 16:16:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.56@o2ib4, removing former export from same NID Aug 27 16:16:08 fir-md1-s1 kernel: Lustre: Skipped 133 previous similar messages Aug 27 16:16:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.1@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 27 16:16:11 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Aug 27 16:16:39 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 16:16:39 fir-md1-s1 kernel: LNetError: 20186:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 143 previous similar messages Aug 27 16:16:45 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f2315c72200 Aug 27 16:16:54 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f1fa299f200 x1643050088996912/t0(0) o101->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:29/0 lens 592/3264 e 0 to 0 dl 1566947819 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 16:16:54 fir-md1-s1 kernel: Lustre: 26258:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 276 previous similar messages Aug 27 16:17:11 fir-md1-s1 kernel: LustreError: 20183:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f33e22cee00 Aug 27 16:17:12 fir-md1-s1 kernel: LustreError: 20197:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8f10b358b000 Aug 27 16:17:40 fir-md1-s1 kernel: LustreError: 21908:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f0b9365b900 x1642588074903616/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:14/0 lens 448/440 e 0 to 0 dl 1566947864 ref 1 fl Interpret:/2/0 rc 0/0 Aug 27 16:17:40 fir-md1-s1 kernel: LustreError: 21908:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 6 previous similar messages Aug 27 16:17:59 fir-md1-s1 kernel: LustreError: 97656:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566947789, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f18d5dd1200/0x5d9ee6e699401aad lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 97656 timeout: 0 lvb_type: 0 Aug 27 16:17:59 fir-md1-s1 kernel: LustreError: 97656:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Aug 27 16:19:25 fir-md1-s1 kernel: LustreError: 71872:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f0e93565400 x1642588100434096/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:25/0 lens 448/440 e 0 to 0 dl 1566947995 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:19:25 fir-md1-s1 kernel: LustreError: 71872:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 14 previous similar messages Aug 27 16:19:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 9af77c67-3c22-54c0-3a66-3f7facf781a7 (at 10.9.107.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f251ba56000, cur 1566947973 expire 1566947823 last 1566947746 Aug 27 16:19:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 16:19:49 fir-md1-s1 kernel: LNet: Service thread pid 97656 was inactive for 200.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 16:19:49 fir-md1-s1 kernel: Pid: 97656, comm: mdt01_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 16:19:49 fir-md1-s1 kernel: Call Trace: Aug 27 16:19:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 16:19:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 16:19:49 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 16:19:49 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 16:19:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 16:19:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 16:19:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 16:19:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 16:19:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 16:19:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566947989.97656 Aug 27 16:22:55 fir-md1-s1 kernel: LustreError: 71847:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f136f4d4050 x1642588140758560/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:25/0 lens 448/440 e 0 to 0 dl 1566948205 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:22:55 fir-md1-s1 kernel: LustreError: 71847:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Aug 27 16:23:33 fir-md1-s1 kernel: Lustre: 21678:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566948206/real 1566948206] req@ffff8f41e26ad400 x1636782261866240/t0(0) o104->fir-MDT0002@10.9.101.58@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1566948213 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 27 16:23:33 fir-md1-s1 kernel: Lustre: 21678:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Aug 27 16:24:57 fir-md1-s1 kernel: LustreError: 23455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566948207, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f2b99d09440/0x5d9ee6e69eeab449 lrc: 3/1,0 mode: --/PR res: [0x2c002cc20:0x117c8:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23455 timeout: 0 lvb_type: 0 Aug 27 16:24:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 16:24:58 fir-md1-s1 kernel: Lustre: Skipped 1036 previous similar messages Aug 27 16:25:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f1bad2aa-6db1-dd20-85dc-e36aabd3f07a (at 10.9.103.34@o2ib4) reconnecting Aug 27 16:25:11 fir-md1-s1 kernel: Lustre: Skipped 997 previous similar messages Aug 27 16:26:00 fir-md1-s1 kernel: LustreError: 21678:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.58@o2ib4) failed to reply to blocking AST (req@ffff8f41e26ad400 x1636782261866240 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f229eca7980/0x5d9ee6e68320ea78 lrc: 4/0,0 mode: PR/PR res: [0x2c002cc20:0x117c8:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.101.58@o2ib4 remote: 0xa3260ff1cac13be2 expref: 676 pid: 24587 timeout: 6063562 lvb_type: 0 Aug 27 16:26:00 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.101.58@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Aug 27 16:26:00 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.101.58@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8f229eca7980/0x5d9ee6e68320ea78 lrc: 4/0,0 mode: PR/PR res: [0x2c002cc20:0x117c8:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.101.58@o2ib4 remote: 0xa3260ff1cac13be2 expref: 677 pid: 24587 timeout: 0 lvb_type: 0 Aug 27 16:26:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f29e1e71-511a-3e98-949d-3f54561359cc (at 10.9.101.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f22fb84e000, cur 1566948379 expire 1566948229 last 1566948152 Aug 27 16:26:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 16:34:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 16:34:58 fir-md1-s1 kernel: Lustre: Skipped 746 previous similar messages Aug 27 16:35:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 27 16:35:11 fir-md1-s1 kernel: Lustre: Skipped 729 previous similar messages Aug 27 16:44:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 27 16:44:59 fir-md1-s1 kernel: Lustre: Skipped 742 previous similar messages Aug 27 16:45:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 214bcacf-deef-8b1a-7220-98313adef1de (at 10.9.102.36@o2ib4) reconnecting Aug 27 16:45:12 fir-md1-s1 kernel: Lustre: Skipped 735 previous similar messages Aug 27 16:52:39 fir-md1-s1 kernel: LustreError: 27023:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f1f5e2eb600 x1643050097371168/t0(0) o37->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:15/0 lens 448/440 e 0 to 0 dl 1566949965 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:53:41 fir-md1-s1 kernel: LustreError: 21037:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk WRITE failed: rc -107 req@ffff8f2c198d6850 x1643050097625728/t0(0) o4->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:17/0 lens 488/448 e 0 to 0 dl 1566950027 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 16:53:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6), client will retry: rc = -107 Aug 27 16:53:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 16:53:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO read error with ec9719ae-e98d-245f-cb43-8c61dda19eb4 (at 10.8.18.29@o2ib6), client will retry: rc -107 Aug 27 16:53:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 16:55:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2a70e919-847e-dc4c-98c2-dcd61e6f6ee4 (at 10.9.102.24@o2ib4) Aug 27 16:55:00 fir-md1-s1 kernel: Lustre: Skipped 733 previous similar messages Aug 27 16:55:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client af04ed26-0e4b-db45-3414-20245014a46d (at 10.8.27.34@o2ib6) reconnecting Aug 27 16:55:13 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 17:01:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 17fa2f85-b498-6aea-0e9b-b4cd8046edb1 (at 10.9.115.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f42c7f3d800, cur 1566950486 expire 1566950336 last 1566950259 Aug 27 17:01:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Aug 27 17:05:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1fe364ff-090f-36f6-ab95-451faca68f9f (at 10.9.102.21@o2ib4) Aug 27 17:05:00 fir-md1-s1 kernel: Lustre: Skipped 736 previous similar messages Aug 27 17:05:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 02dfd968-e7b1-52cc-0db8-aa0d10c0832c (at 10.9.102.19@o2ib4) reconnecting Aug 27 17:05:15 fir-md1-s1 kernel: Lustre: Skipped 728 previous similar messages Aug 27 17:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 17:15:00 fir-md1-s1 kernel: Lustre: Skipped 729 previous similar messages Aug 27 17:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 17:15:17 fir-md1-s1 kernel: Lustre: Skipped 737 previous similar messages Aug 27 17:19:31 fir-md1-s1 kernel: LustreError: 71858:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3384e32d00 x1643050103762528/t0(0) o37->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:7/0 lens 448/440 e 0 to 0 dl 1566951577 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 17:19:31 fir-md1-s1 kernel: LustreError: 71858:0:(ldlm_lib.c:3207:target_bulk_io()) Skipped 1 previous similar message Aug 27 17:25:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6b77b9c9-5399-d7d0-f89a-3e0962ace3c7 (at 10.8.28.5@o2ib6) Aug 27 17:25:00 fir-md1-s1 kernel: Lustre: Skipped 736 previous similar messages Aug 27 17:25:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client af04ed26-0e4b-db45-3414-20245014a46d (at 10.8.27.34@o2ib6) reconnecting Aug 27 17:25:19 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 17:32:57 fir-md1-s1 kernel: LustreError: 27027:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f22e9572100 x1643050106754384/t0(0) o37->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:27/0 lens 448/440 e 0 to 0 dl 1566952407 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 17:35:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 17:35:01 fir-md1-s1 kernel: Lustre: Skipped 729 previous similar messages Aug 27 17:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 214bcacf-deef-8b1a-7220-98313adef1de (at 10.9.102.36@o2ib4) reconnecting Aug 27 17:35:19 fir-md1-s1 kernel: Lustre: Skipped 735 previous similar messages Aug 27 17:45:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 17:45:02 fir-md1-s1 kernel: Lustre: Skipped 727 previous similar messages Aug 27 17:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 27 17:45:21 fir-md1-s1 kernel: Lustre: Skipped 733 previous similar messages Aug 27 17:55:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 17:55:04 fir-md1-s1 kernel: Lustre: Skipped 737 previous similar messages Aug 27 17:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 02dfd968-e7b1-52cc-0db8-aa0d10c0832c (at 10.9.102.19@o2ib4) reconnecting Aug 27 17:55:23 fir-md1-s1 kernel: Lustre: Skipped 727 previous similar messages Aug 27 18:05:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to fc0afdb4-925b-760a-36c3-6eae7e9372be (at 10.9.102.35@o2ib4) Aug 27 18:05:07 fir-md1-s1 kernel: Lustre: Skipped 732 previous similar messages Aug 27 18:05:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 27 18:05:24 fir-md1-s1 kernel: Lustre: Skipped 737 previous similar messages Aug 27 18:15:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6b77b9c9-5399-d7d0-f89a-3e0962ace3c7 (at 10.8.28.5@o2ib6) Aug 27 18:15:08 fir-md1-s1 kernel: Lustre: Skipped 736 previous similar messages Aug 27 18:15:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 437db638-1a8f-d9e7-3d4a-b386602e77f0 (at 10.9.102.35@o2ib4) reconnecting Aug 27 18:15:27 fir-md1-s1 kernel: Lustre: Skipped 732 previous similar messages Aug 27 18:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 18:25:09 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 18:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client aa3ee41d-cac0-6749-5220-bb62e9eebc36 (at 10.8.28.5@o2ib6) reconnecting Aug 27 18:25:28 fir-md1-s1 kernel: Lustre: Skipped 736 previous similar messages Aug 27 18:35:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 18:35:09 fir-md1-s1 kernel: Lustre: Skipped 727 previous similar messages Aug 27 18:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 27 18:35:29 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 18:45:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 18:45:12 fir-md1-s1 kernel: Lustre: Skipped 738 previous similar messages Aug 27 18:45:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client af04ed26-0e4b-db45-3414-20245014a46d (at 10.8.27.34@o2ib6) reconnecting Aug 27 18:45:29 fir-md1-s1 kernel: Lustre: Skipped 725 previous similar messages Aug 27 18:55:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 27 18:55:12 fir-md1-s1 kernel: Lustre: Skipped 730 previous similar messages Aug 27 18:55:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 27 18:55:32 fir-md1-s1 kernel: Lustre: Skipped 738 previous similar messages Aug 27 19:05:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 29b52eb8-dab6-4b88-7a0d-057d59d63b47 (at 10.8.17.22@o2ib6) Aug 27 19:05:13 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 19:05:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 19:05:33 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 19:15:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1fe364ff-090f-36f6-ab95-451faca68f9f (at 10.9.102.21@o2ib4) Aug 27 19:15:13 fir-md1-s1 kernel: Lustre: Skipped 732 previous similar messages Aug 27 19:15:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d31107d5-0348-6f95-7970-9bba1ab39904 (at 10.9.102.24@o2ib4) reconnecting Aug 27 19:15:33 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 19:21:10 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566958863/real 1566958863] req@ffff8f15e182e300 x1636782451356464/t0(0) o104->fir-MDT0002@10.8.28.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566958870 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 27 19:21:10 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages Aug 27 19:21:18 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f22e96fa400 x1631594036492496/t0(0) o101->5af85e95-71ec-5689-9879-f126f8845b44@10.8.27.1@o2ib6:23/0 lens 1792/3288 e 1 to 0 dl 1566958883 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 19:21:18 fir-md1-s1 kernel: Lustre: 21455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Aug 27 19:21:31 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566958884/real 1566958884] req@ffff8f15e182e300 x1636782451356464/t0(0) o104->fir-MDT0002@10.8.28.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1566958891 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Aug 27 19:21:31 fir-md1-s1 kernel: Lustre: 97661:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Aug 27 19:21:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 58b596bd-0d10-44d4-98c1-1cccf4b18cd9 (at 10.9.104.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3cecb59000, cur 1566958894 expire 1566958744 last 1566958667 Aug 27 19:21:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 19:21:38 fir-md1-s1 kernel: LustreError: 97661:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.28.3@o2ib6) failed to reply to blocking AST (req@ffff8f15e182e300 x1636782451356464 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8f2e16b6de80/0x5d9ee6e6e02333c8 lrc: 4/0,0 mode: PR/PR res: [0x2c002c9ae:0x305f:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.8.28.3@o2ib6 remote: 0x8a5f985ce096fa6f expref: 4408 pid: 97651 timeout: 6073980 lvb_type: 0 Aug 27 19:21:38 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.28.3@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Aug 27 19:21:38 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.28.3@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e16b6de80/0x5d9ee6e6e02333c8 lrc: 3/0,0 mode: PR/PR res: [0x2c002c9ae:0x305f:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.8.28.3@o2ib6 remote: 0x8a5f985ce096fa6f expref: 4409 pid: 97651 timeout: 0 lvb_type: 0 Aug 27 19:22:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ca15d879-1cb2-8780-e5e2-20230d9e27cf (at 10.8.28.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252115a800, cur 1566958974 expire 1566958824 last 1566958747 Aug 27 19:22:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 27 19:23:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a6b0a448-3849-1122-7e48-8cd92299876a (at 10.8.28.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f252eb46400, cur 1566959025 expire 1566958875 last 1566958798 Aug 27 19:25:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 19:25:14 fir-md1-s1 kernel: Lustre: Skipped 722 previous similar messages Aug 27 19:25:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 98305627-c518-5132-2386-e8ff7f2f8fb5 (at 10.9.102.21@o2ib4) reconnecting Aug 27 19:25:33 fir-md1-s1 kernel: Lustre: Skipped 721 previous similar messages Aug 27 19:26:40 fir-md1-s1 kernel: LustreError: 20962:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3e71c78000 x1643047872822864/t0(0) o37->01220ca0-c29f-4cb8-bddb-c495482aa608@10.9.0.61@o2ib4:16/0 lens 448/440 e 0 to 0 dl 1566959206 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 19:29:21 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 19:29:21 fir-md1-s1 kernel: LNetError: 20198:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 53 previous similar messages Aug 27 19:33:07 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 19:33:07 fir-md1-s1 kernel: LNetError: 20184:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 19:35:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Aug 27 19:35:14 fir-md1-s1 kernel: Lustre: Skipped 719 previous similar messages Aug 27 19:35:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bdb11576-343d-220d-63d1-1ff1ea0ae4cb (at 10.8.28.7@o2ib6) reconnecting Aug 27 19:35:33 fir-md1-s1 kernel: Lustre: Skipped 710 previous similar messages Aug 27 19:36:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 22cf9651-af46-3436-6bbd-858bc1edfdfc (at 10.8.6.26@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f1feee08400, cur 1566959780 expire 1566959630 last 1566959553 Aug 27 19:39:27 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 19:45:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 19:45:14 fir-md1-s1 kernel: Lustre: Skipped 710 previous similar messages Aug 27 19:45:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client aa3ee41d-cac0-6749-5220-bb62e9eebc36 (at 10.8.28.5@o2ib6) reconnecting Aug 27 19:45:34 fir-md1-s1 kernel: Lustre: Skipped 718 previous similar messages Aug 27 19:46:44 fir-md1-s1 kernel: LustreError: 21878:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f36ec484500 x1643047990480672/t0(0) o37->01220ca0-c29f-4cb8-bddb-c495482aa608@10.9.0.61@o2ib4:20/0 lens 448/440 e 0 to 0 dl 1566960410 ref 1 fl Interpret:/0/0 rc 0/0 Aug 27 19:47:42 fir-md1-s1 kernel: Lustre: 20462:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f1d34cd6c00 x1634279796377632/t0(0) o101->9eb88991-51bc-5034-bc55-1b8fa8295e05@10.8.27.24@o2ib6:17/0 lens 592/3264 e 1 to 0 dl 1566960467 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 19:48:07 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Aug 27 19:48:58 fir-md1-s1 kernel: LustreError: 20728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566960447, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f16622e4a40/0x5d9ee6e6e8f3e889 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 26 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20728 timeout: 0 lvb_type: 0 Aug 27 19:50:48 fir-md1-s1 kernel: LNet: Service thread pid 20728 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 19:50:48 fir-md1-s1 kernel: Pid: 20728, comm: mdt01_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 19:50:48 fir-md1-s1 kernel: Call Trace: Aug 27 19:50:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 19:50:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 19:50:48 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 19:50:48 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 19:50:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 19:50:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 19:50:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 19:50:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 19:50:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 19:50:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566960648.20728 Aug 27 19:54:05 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 19:54:05 fir-md1-s1 kernel: LNetError: 20192:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Aug 27 19:55:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 19:55:15 fir-md1-s1 kernel: Lustre: Skipped 732 previous similar messages Aug 27 19:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 27 19:55:34 fir-md1-s1 kernel: Lustre: Skipped 738 previous similar messages Aug 27 20:00:23 fir-md1-s1 kernel: Lustre: 21421:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e085f8000 x1643052943549872/t0(0) o101->f37a46e0-1e70-6b27-1459-0c7be76fae27@10.0.10.3@o2ib7:28/0 lens 576/3264 e 1 to 0 dl 1566961228 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 20:01:38 fir-md1-s1 kernel: LustreError: 23569:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566961208, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f1189fd7740/0x5d9ee6e6ecf84bcd lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x12/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23569 timeout: 0 lvb_type: 0 Aug 27 20:03:29 fir-md1-s1 kernel: LNet: Service thread pid 23569 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 20:03:29 fir-md1-s1 kernel: Pid: 23569, comm: mdt00_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 20:03:29 fir-md1-s1 kernel: Call Trace: Aug 27 20:03:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 20:03:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 20:03:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Aug 27 20:03:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 20:03:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 20:03:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 20:03:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 20:03:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 20:03:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 20:03:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566961409.23569 Aug 27 20:05:10 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 20:05:10 fir-md1-s1 kernel: LNetError: 20190:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 6 previous similar messages Aug 27 20:05:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f12e1763-87eb-0ba4-fc21-3f444b35e74b (at 10.9.105.14@o2ib4) Aug 27 20:05:17 fir-md1-s1 kernel: Lustre: Skipped 743 previous similar messages Aug 27 20:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 27 20:05:35 fir-md1-s1 kernel: Lustre: Skipped 731 previous similar messages Aug 27 20:05:54 fir-md1-s1 kernel: Lustre: 23577:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f121912ce00 x1642681786570576/t0(0) o101->a2d9e346-a053-de8e-7ad6-cf9b0f3782fb@10.9.102.2@o2ib4:29/0 lens 592/3264 e 1 to 0 dl 1566961559 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 20:07:09 fir-md1-s1 kernel: LustreError: 21419:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566961539, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0fab55e9c0/0x5d9ee6e6ee458179 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x13/0x0 rrc: 29 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21419 timeout: 0 lvb_type: 0 Aug 27 20:08:59 fir-md1-s1 kernel: LNet: Service thread pid 21419 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 20:08:59 fir-md1-s1 kernel: Pid: 21419, comm: mdt00_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 20:08:59 fir-md1-s1 kernel: Call Trace: Aug 27 20:08:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 20:08:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 20:08:59 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 20:08:59 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 20:08:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 20:08:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 20:08:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 20:08:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 20:08:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 20:08:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566961739.21419 Aug 27 20:15:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 27 20:15:21 fir-md1-s1 kernel: Lustre: Skipped 780 previous similar messages Aug 27 20:15:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 20:15:36 fir-md1-s1 kernel: Lustre: Skipped 781 previous similar messages Aug 27 20:25:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 1062a832-c778-75e1-da43-5a08be7649ee (at 10.9.102.38@o2ib4) Aug 27 20:25:21 fir-md1-s1 kernel: Lustre: Skipped 773 previous similar messages Aug 27 20:25:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bdb11576-343d-220d-63d1-1ff1ea0ae4cb (at 10.8.28.7@o2ib6) reconnecting Aug 27 20:25:37 fir-md1-s1 kernel: Lustre: Skipped 767 previous similar messages Aug 27 20:35:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 20:35:21 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Aug 27 20:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 569c80f1-e322-40ae-cf23-d3ca8807a6fa (at 10.9.102.40@o2ib4) reconnecting Aug 27 20:35:38 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Aug 27 20:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 20:45:21 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Aug 27 20:45:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client de259a64-2100-eb0d-e7c9-3532a08afec2 (at 10.9.102.41@o2ib4) reconnecting Aug 27 20:45:38 fir-md1-s1 kernel: Lustre: Skipped 770 previous similar messages Aug 27 20:55:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 44722a71-cad3-3d5d-1efa-3e651b285674 (at 10.9.102.2@o2ib4) Aug 27 20:55:22 fir-md1-s1 kernel: Lustre: Skipped 771 previous similar messages Aug 27 20:55:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 20:55:40 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Aug 27 20:56:24 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Aug 27 20:56:24 fir-md1-s1 kernel: LNetError: 20193:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages Aug 27 21:05:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 27 21:05:22 fir-md1-s1 kernel: Lustre: Skipped 782 previous similar messages Aug 27 21:05:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 27 21:05:45 fir-md1-s1 kernel: Lustre: Skipped 792 previous similar messages Aug 27 21:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 38b82dfa-e53b-ccd1-6116-2509b41b20ab (at 10.9.102.37@o2ib4) Aug 27 21:15:22 fir-md1-s1 kernel: Lustre: Skipped 767 previous similar messages Aug 27 21:15:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 21:15:46 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Aug 27 21:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 21:25:23 fir-md1-s1 kernel: Lustre: Skipped 770 previous similar messages Aug 27 21:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8c44a420-9990-75c1-2b64-64b6fe5d1b1b (at 10.9.102.27@o2ib4) reconnecting Aug 27 21:25:47 fir-md1-s1 kernel: Lustre: Skipped 768 previous similar messages Aug 27 21:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 29b52eb8-dab6-4b88-7a0d-057d59d63b47 (at 10.8.17.22@o2ib6) Aug 27 21:35:23 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Aug 27 21:35:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 98305627-c518-5132-2386-e8ff7f2f8fb5 (at 10.9.102.21@o2ib4) reconnecting Aug 27 21:35:47 fir-md1-s1 kernel: Lustre: Skipped 773 previous similar messages Aug 27 21:45:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 27 21:45:23 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Aug 27 21:45:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 21:45:47 fir-md1-s1 kernel: Lustre: Skipped 765 previous similar messages Aug 27 21:55:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 27 21:55:25 fir-md1-s1 kernel: Lustre: Skipped 773 previous similar messages Aug 27 21:55:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 27 21:55:49 fir-md1-s1 kernel: Lustre: Skipped 775 previous similar messages Aug 27 22:05:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 27 22:05:26 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Aug 27 22:05:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 22:05:49 fir-md1-s1 kernel: Lustre: Skipped 767 previous similar messages Aug 27 22:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 22:15:30 fir-md1-s1 kernel: Lustre: Skipped 748 previous similar messages Aug 27 22:15:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fbf0693a-98fc-0dbf-e84f-7c1f5df6792d (at 10.9.102.37@o2ib4) reconnecting Aug 27 22:15:50 fir-md1-s1 kernel: Lustre: Skipped 737 previous similar messages Aug 27 22:16:46 fir-md1-s1 kernel: Lustre: 10505:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0b9dc96900 x1643051662808320/t0(0) o101->9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5@10.9.0.64@o2ib4:21/0 lens 584/3264 e 1 to 0 dl 1566969411 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 22:18:01 fir-md1-s1 kernel: LustreError: 10504:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969391, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1275822640/0x5d9ee6e70bada6a3 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10504 timeout: 0 lvb_type: 0 Aug 27 22:18:45 fir-md1-s1 kernel: Lustre: 10505:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e9631c200 x1642588693470928/t0(0) o101->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:20/0 lens 584/3264 e 1 to 0 dl 1566969530 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 22:19:27 fir-md1-s1 kernel: Lustre: 21412:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-7), not sending early reply req@ffff8f331b092a00 x1642588693574624/t0(0) o101->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:2/0 lens 584/3264 e 1 to 0 dl 1566969572 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 22:19:51 fir-md1-s1 kernel: LNet: Service thread pid 10504 was inactive for 200.69s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 22:19:51 fir-md1-s1 kernel: Pid: 10504, comm: mdt00_042 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 22:19:51 fir-md1-s1 kernel: Call Trace: Aug 27 22:19:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 22:19:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 22:19:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 22:19:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 22:19:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 22:19:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 22:19:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 22:19:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 22:19:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 22:19:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566969591.10504 Aug 27 22:20:00 fir-md1-s1 kernel: LustreError: 20457:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969510, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f07c5f1ba80/0x5d9ee6e70bdaec42 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20457 timeout: 0 lvb_type: 0 Aug 27 22:20:30 fir-md1-s1 kernel: LustreError: 23607:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969540, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ac3114a40/0x5d9ee6e70be681a2 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23607 timeout: 0 lvb_type: 0 Aug 27 22:21:50 fir-md1-s1 kernel: LNet: Service thread pid 20457 was inactive for 200.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 22:21:50 fir-md1-s1 kernel: Pid: 20457, comm: mdt00_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 22:21:50 fir-md1-s1 kernel: Call Trace: Aug 27 22:21:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 22:21:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 22:21:50 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 22:21:50 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 22:21:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 22:21:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 22:21:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 22:21:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 22:21:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 22:21:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566969710.20457 Aug 27 22:22:20 fir-md1-s1 kernel: LNet: Service thread pid 23607 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 22:22:20 fir-md1-s1 kernel: Pid: 23607, comm: mdt02_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 22:22:20 fir-md1-s1 kernel: Call Trace: Aug 27 22:22:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 22:22:20 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 22:22:20 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 27 22:22:20 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 22:22:20 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 22:22:20 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 22:22:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 22:22:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 22:22:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 22:22:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566969740.23607 Aug 27 22:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 27 22:25:34 fir-md1-s1 kernel: Lustre: Skipped 790 previous similar messages Aug 27 22:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 27 22:25:51 fir-md1-s1 kernel: Lustre: Skipped 791 previous similar messages Aug 27 22:35:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a19fbd52-fc1f-6afe-5025-88bbd6370298 (at 10.9.102.36@o2ib4) Aug 27 22:35:34 fir-md1-s1 kernel: Lustre: Skipped 803 previous similar messages Aug 27 22:35:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 22:35:55 fir-md1-s1 kernel: Lustre: Skipped 798 previous similar messages Aug 27 22:38:23 fir-md1-s1 kernel: Lustre: 21370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0e55c24b00 x1643052944957504/t0(0) o101->f37a46e0-1e70-6b27-1459-0c7be76fae27@10.0.10.3@o2ib7:28/0 lens 576/3264 e 1 to 0 dl 1566970708 ref 2 fl Interpret:/0/0 rc 0/0 Aug 27 22:39:38 fir-md1-s1 kernel: LustreError: 21410:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566970688, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0a1087b600/0x5d9ee6e70f518974 lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x12/0x0 rrc: 31 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21410 timeout: 0 lvb_type: 0 Aug 27 22:41:29 fir-md1-s1 kernel: LNet: Service thread pid 21410 was inactive for 200.63s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 27 22:41:29 fir-md1-s1 kernel: Pid: 21410, comm: mdt00_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 27 22:41:29 fir-md1-s1 kernel: Call Trace: Aug 27 22:41:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 27 22:41:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 27 22:41:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Aug 27 22:41:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 27 22:41:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 27 22:41:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 27 22:41:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 27 22:41:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 27 22:41:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 27 22:41:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1566970889.21410 Aug 27 22:45:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 27 22:45:34 fir-md1-s1 kernel: Lustre: Skipped 817 previous similar messages Aug 27 22:45:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 27 22:45:55 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 27 22:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Aug 27 22:55:34 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 27 22:55:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 27 22:55:57 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 27 23:05:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 27 23:05:34 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 27 23:05:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 397e53ea-489f-22f1-95c4-27ab82ab5709 (at 10.9.102.43@o2ib4) reconnecting Aug 27 23:05:57 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 27 23:15:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 27 23:15:35 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 27 23:15:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 27 23:15:58 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 27 23:25:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 38b82dfa-e53b-ccd1-6116-2509b41b20ab (at 10.9.102.37@o2ib4) Aug 27 23:25:35 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 27 23:25:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 27 23:25:59 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 27 23:35:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 27 23:35:36 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 27 23:36:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8c44a420-9990-75c1-2b64-64b6fe5d1b1b (at 10.9.102.27@o2ib4) reconnecting Aug 27 23:36:00 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 27 23:45:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 27 23:45:37 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 27 23:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 98305627-c518-5132-2386-e8ff7f2f8fb5 (at 10.9.102.21@o2ib4) reconnecting Aug 27 23:46:00 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 27 23:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 27 23:55:37 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 27 23:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 27 23:56:00 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 00:05:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 00:05:38 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 00:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 28 00:06:01 fir-md1-s1 kernel: Lustre: Skipped 832 previous similar messages Aug 28 00:15:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 28 00:15:39 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 00:16:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 28 00:16:03 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 00:25:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dee4f99a-2654-25ae-e6ec-cb4bc3f136c5 (at 10.8.28.6@o2ib6) Aug 28 00:25:39 fir-md1-s1 kernel: Lustre: Skipped 832 previous similar messages Aug 28 00:26:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fbf0693a-98fc-0dbf-e84f-7c1f5df6792d (at 10.9.102.37@o2ib4) reconnecting Aug 28 00:26:03 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 00:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 28 00:35:39 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 00:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 00:36:04 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 00:45:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5a6d64b7-8368-162d-98b4-716457bd6d0c (at 10.9.102.19@o2ib4) Aug 28 00:45:40 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 00:46:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 00:46:08 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 00:49:15 fir-md1-s1 kernel: LustreError: 20954:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3ffc73cc50 x1643051964659888/t0(0) o37->01220ca0-c29f-4cb8-bddb-c495482aa608@10.9.0.61@o2ib4:21/0 lens 448/440 e 0 to 0 dl 1566978561 ref 1 fl Interpret:/0/0 rc 0/0 Aug 28 00:55:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 00:55:42 fir-md1-s1 kernel: Lustre: Skipped 832 previous similar messages Aug 28 00:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4) reconnecting Aug 28 00:56:08 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 01:05:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d885ba76-20e7-1b98-2018-7e24c1d853b4 (at 10.8.0.68@o2ib6) Aug 28 01:05:45 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 01:06:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 28 01:06:09 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 01:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to a19fbd52-fc1f-6afe-5025-88bbd6370298 (at 10.9.102.36@o2ib4) Aug 28 01:15:45 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 01:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bdb11576-343d-220d-63d1-1ff1ea0ae4cb (at 10.8.28.7@o2ib6) reconnecting Aug 28 01:16:09 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 01:25:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 28 01:25:45 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 01:26:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 28 01:26:09 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 01:35:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6020dc6-5ae0-1fda-6229-432d9300dcb9 (at 10.9.0.61@o2ib4) Aug 28 01:35:46 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 01:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 28 01:36:12 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 01:45:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 28 01:45:46 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 01:46:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d31107d5-0348-6f95-7970-9bba1ab39904 (at 10.9.102.24@o2ib4) reconnecting Aug 28 01:46:13 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 01:55:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 28 01:55:46 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 01:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1f080c09-2fe8-8c89-493e-0f353450ad44 (at 10.9.102.20@o2ib4) reconnecting Aug 28 01:56:13 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 02:05:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 460b4624-f225-0fc6-9d6f-aee495221c30 (at 10.9.102.46@o2ib4) Aug 28 02:05:48 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 02:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 28 02:06:14 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 02:15:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 28 02:15:48 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 02:16:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4) reconnecting Aug 28 02:16:14 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 02:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 28 02:25:48 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 02:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 28 02:26:16 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 02:35:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 28 02:35:49 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 02:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 59f098aa-fb21-8ed8-84bd-d0ce06cad654 (at 10.9.102.46@o2ib4) reconnecting Aug 28 02:36:17 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 02:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 02:45:49 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 02:46:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 02:46:17 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 02:55:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 28 02:55:50 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 02:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 02:56:20 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 03:05:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3cf0b37c-2835-0832-0bb4-dc0931773c92 (at 10.9.103.25@o2ib4) Aug 28 03:05:50 fir-md1-s1 kernel: Lustre: Skipped 834 previous similar messages Aug 28 03:06:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bb520558-2d6f-90e6-2e79-a805392091ac (at 10.9.102.38@o2ib4) reconnecting Aug 28 03:06:20 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 03:15:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 28 03:15:52 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 03:16:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 03:16:20 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 03:25:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 28 03:25:52 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 03:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 01220ca0-c29f-4cb8-bddb-c495482aa608 (at 10.9.0.61@o2ib4) reconnecting Aug 28 03:26:20 fir-md1-s1 kernel: Lustre: Skipped 819 previous similar messages Aug 28 03:35:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 03:35:53 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 03:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 28 03:36:22 fir-md1-s1 kernel: Lustre: Skipped 833 previous similar messages Aug 28 03:45:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 28 03:45:55 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 03:46:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 28 03:46:25 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 03:55:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2a70e919-847e-dc4c-98c2-dcd61e6f6ee4 (at 10.9.102.24@o2ib4) Aug 28 03:55:55 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 03:56:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 28 03:56:25 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 04:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 28 04:05:55 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 04:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client de259a64-2100-eb0d-e7c9-3532a08afec2 (at 10.9.102.41@o2ib4) reconnecting Aug 28 04:06:26 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 04:15:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 28 04:15:56 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 04:16:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a2d9e346-a053-de8e-7ad6-cf9b0f3782fb (at 10.9.102.2@o2ib4) reconnecting Aug 28 04:16:26 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 04:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3cf0b37c-2835-0832-0bb4-dc0931773c92 (at 10.9.103.25@o2ib4) Aug 28 04:25:57 fir-md1-s1 kernel: Lustre: Skipped 832 previous similar messages Aug 28 04:26:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 28 04:26:26 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 04:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 28 04:35:58 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 04:36:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 04:36:26 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 04:45:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 460b4624-f225-0fc6-9d6f-aee495221c30 (at 10.9.102.46@o2ib4) Aug 28 04:45:59 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 04:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 02dfd968-e7b1-52cc-0db8-aa0d10c0832c (at 10.9.102.19@o2ib4) reconnecting Aug 28 04:46:28 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 04:55:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 28 04:55:59 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 04:56:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 04:56:30 fir-md1-s1 kernel: Lustre: Skipped 833 previous similar messages Aug 28 05:05:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 28 05:05:59 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 05:06:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 05:06:32 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 05:15:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 28 05:15:59 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 05:16:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d31107d5-0348-6f95-7970-9bba1ab39904 (at 10.9.102.24@o2ib4) reconnecting Aug 28 05:16:32 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 05:26:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 05:26:00 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 05:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1f080c09-2fe8-8c89-493e-0f353450ad44 (at 10.9.102.20@o2ib4) reconnecting Aug 28 05:26:32 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 05:36:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 28 05:36:01 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 05:36:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 28 05:36:32 fir-md1-s1 kernel: Lustre: Skipped 819 previous similar messages Aug 28 05:46:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b54c3ee-9728-0c5d-13b9-02c56cefb912 (at 10.8.28.7@o2ib6) Aug 28 05:46:02 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 05:46:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4) reconnecting Aug 28 05:46:34 fir-md1-s1 kernel: Lustre: Skipped 835 previous similar messages Aug 28 05:54:37 fir-md1-s1 kernel: LustreError: 21865:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8f1a3b28a050 x1642588769264224/t0(0) o37->bc86db0e-d9be-ea60-6163-701107d58182@10.9.0.62@o2ib4:13/0 lens 448/440 e 0 to 0 dl 1566996883 ref 1 fl Interpret:/0/0 rc 0/0 Aug 28 05:56:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 28 05:56:02 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 05:56:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 28 05:56:34 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 06:06:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5a6d64b7-8368-162d-98b4-716457bd6d0c (at 10.9.102.19@o2ib4) Aug 28 06:06:03 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 06:06:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c56b34d4-3ae6-19a5-6d19-cc66577d2e25 (at 10.9.102.17@o2ib4) reconnecting Aug 28 06:06:36 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 06:16:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 06:16:05 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 06:16:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 06:16:36 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 06:26:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 28 06:26:06 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 06:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 28 06:26:36 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 06:36:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b54c3ee-9728-0c5d-13b9-02c56cefb912 (at 10.8.28.7@o2ib6) Aug 28 06:36:06 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 06:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 569c80f1-e322-40ae-cf23-d3ca8807a6fa (at 10.9.102.40@o2ib4) reconnecting Aug 28 06:36:36 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 06:46:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72ddd52f-2877-4d72-483b-2a30690dc155 (at 10.0.10.3@o2ib7) Aug 28 06:46:06 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 06:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a2d9e346-a053-de8e-7ad6-cf9b0f3782fb (at 10.9.102.2@o2ib4) reconnecting Aug 28 06:46:36 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 06:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 28 06:56:07 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 06:56:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 28 06:56:38 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 07:06:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 28 07:06:09 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 07:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 914b63c8-3a12-8009-32f3-deaae1cd82be (at 10.8.0.68@o2ib6) reconnecting Aug 28 07:06:38 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 07:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 28 07:16:09 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 07:16:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 28 07:16:39 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 07:26:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 75532495-fd40-56b3-66c3-c614ac097dda (at 10.8.27.34@o2ib6) Aug 28 07:26:09 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 07:26:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 02dfd968-e7b1-52cc-0db8-aa0d10c0832c (at 10.9.102.19@o2ib4) reconnecting Aug 28 07:26:40 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 07:36:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 07:36:10 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 07:36:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 07:36:41 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 07:46:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e4594a87-2fe5-1bf8-dbe3-26a702178742 (at 10.8.0.67@o2ib6) Aug 28 07:46:11 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 07:46:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 129e30f2-c57f-6250-073e-65cd07205967 (at 10.8.0.67@o2ib6) reconnecting Aug 28 07:46:42 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 07:56:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 28 07:56:11 fir-md1-s1 kernel: Lustre: Skipped 825 previous similar messages Aug 28 07:56:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 569c80f1-e322-40ae-cf23-d3ca8807a6fa (at 10.9.102.40@o2ib4) reconnecting Aug 28 07:56:42 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 08:03:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9b7d443-6e99-c10b-4d68-3e3fa30c5530 (at 10.9.113.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f3a37550800, cur 1567004604 expire 1567004454 last 1567004377 Aug 28 08:03:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 28 08:06:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 08:06:12 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 08:06:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 28 08:06:42 fir-md1-s1 kernel: Lustre: Skipped 823 previous similar messages Aug 28 08:16:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b54c3ee-9728-0c5d-13b9-02c56cefb912 (at 10.8.28.7@o2ib6) Aug 28 08:16:13 fir-md1-s1 kernel: Lustre: Skipped 820 previous similar messages Aug 28 08:16:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bc86db0e-d9be-ea60-6163-701107d58182 (at 10.9.0.62@o2ib4) reconnecting Aug 28 08:16:43 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 08:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3cf0b37c-2835-0832-0bb4-dc0931773c92 (at 10.9.103.25@o2ib4) Aug 28 08:26:15 fir-md1-s1 kernel: Lustre: Skipped 836 previous similar messages Aug 28 08:26:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 214bcacf-deef-8b1a-7220-98313adef1de (at 10.9.102.36@o2ib4) reconnecting Aug 28 08:26:43 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 08:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 29b52eb8-dab6-4b88-7a0d-057d59d63b47 (at 10.8.17.22@o2ib6) Aug 28 08:36:17 fir-md1-s1 kernel: Lustre: Skipped 826 previous similar messages Aug 28 08:36:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 28 08:36:46 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Aug 28 08:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 28 08:46:18 fir-md1-s1 kernel: Lustre: Skipped 794 previous similar messages Aug 28 08:46:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d96d2d4a-213c-de28-afa6-2cb1bee603bd (at 10.8.17.22@o2ib6) reconnecting Aug 28 08:46:47 fir-md1-s1 kernel: Lustre: Skipped 798 previous similar messages Aug 28 08:46:58 fir-md1-s1 kernel: Lustre: 23660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f104780bf00 x1643052947350128/t0(0) o101->f37a46e0-1e70-6b27-1459-0c7be76fae27@10.0.10.3@o2ib7:3/0 lens 576/3264 e 1 to 0 dl 1567007223 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 08:48:13 fir-md1-s1 kernel: LustreError: 21411:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567007203, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f0f99e5a400/0x5d9ee6e77440948a lrc: 3/1,0 mode: --/PR res: [0x2c002cce2:0x50fd:0x0].0x0 bits 0x12/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21411 timeout: 0 lvb_type: 0 Aug 28 08:50:04 fir-md1-s1 kernel: LNet: Service thread pid 21411 was inactive for 200.77s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 08:50:04 fir-md1-s1 kernel: Pid: 21411, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 08:50:04 fir-md1-s1 kernel: Call Trace: Aug 28 08:50:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 08:50:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 08:50:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Aug 28 08:50:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 08:50:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 08:50:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 08:50:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 08:50:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 08:50:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 08:50:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567007404.21411 Aug 28 08:56:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 08:56:19 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 08:56:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 08:56:47 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 09:06:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7970551f-2ab2-caa1-77de-53cac10f4fea (at 10.9.102.18@o2ib4) Aug 28 09:06:20 fir-md1-s1 kernel: Lustre: Skipped 824 previous similar messages Aug 28 09:06:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b7b0a619-7218-f426-d2fe-580080e090ee (at 10.9.102.18@o2ib4) reconnecting Aug 28 09:06:51 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 09:08:45 fir-md1-s1 kernel: LustreError: 21014:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff8f3269ccbf00 x1643050200896928/t0(0) o37->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:21/0 lens 448/440 e 0 to 0 dl 1567008531 ref 1 fl Interpret:/0/0 rc 0/0 Aug 28 09:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3cf0b37c-2835-0832-0bb4-dc0931773c92 (at 10.9.103.25@o2ib4) Aug 28 09:16:22 fir-md1-s1 kernel: Lustre: Skipped 830 previous similar messages Aug 28 09:16:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4) reconnecting Aug 28 09:16:53 fir-md1-s1 kernel: Lustre: Skipped 835 previous similar messages Aug 28 09:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 28 09:26:22 fir-md1-s1 kernel: Lustre: Skipped 829 previous similar messages Aug 28 09:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 28 09:26:53 fir-md1-s1 kernel: Lustre: Skipped 821 previous similar messages Aug 28 09:36:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b54c3ee-9728-0c5d-13b9-02c56cefb912 (at 10.8.28.7@o2ib6) Aug 28 09:36:22 fir-md1-s1 kernel: Lustre: Skipped 819 previous similar messages Aug 28 09:36:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c56b34d4-3ae6-19a5-6d19-cc66577d2e25 (at 10.9.102.17@o2ib4) reconnecting Aug 28 09:36:54 fir-md1-s1 kernel: Lustre: Skipped 828 previous similar messages Aug 28 09:46:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 09:46:24 fir-md1-s1 kernel: Lustre: Skipped 832 previous similar messages Aug 28 09:46:51 fir-md1-s1 kernel: Lustre: 23584:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f26c2213f00 x1643050228225552/t0(0) o36->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:26/0 lens 488/3152 e 1 to 0 dl 1567010816 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 09:46:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 09:46:55 fir-md1-s1 kernel: Lustre: Skipped 827 previous similar messages Aug 28 09:47:05 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.0.67@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8f0f52bb2ac0/0x5d9ee6e77e4ffe3b lrc: 3/0,0 mode: PR/PR res: [0x2c002c57b:0x83ef:0x0].0x0 bits 0x5b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.0.67@o2ib6 remote: 0x4f2e9d302cac2e1a expref: 570135 pid: 23671 timeout: 6125885 lvb_type: 0 Aug 28 09:48:06 fir-md1-s1 kernel: LustreError: 23671:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567010796, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8f27b2f6c140/0x5d9ee6e77e568892 lrc: 3/0,1 mode: --/PW res: [0x2c002c57b:0x83ef:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23671 timeout: 0 lvb_type: 0 Aug 28 09:49:54 fir-md1-s1 kernel: Lustre: 23671:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:157s); client may timeout. req@ffff8f26c2213f00 x1643050228225552/t0(0) o36->129e30f2-c57f-6250-073e-65cd07205967@10.8.0.67@o2ib6:26/0 lens 488/424 e 1 to 0 dl 1567010837 ref 1 fl Complete:/0/0 rc -1/-1 Aug 28 09:49:54 fir-md1-s1 kernel: Lustre: 23671:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Aug 28 09:55:21 fir-md1-s1 kernel: LustreError: 23713:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4516ed5c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f396a799d40/0x5d9ee6e658181aff lrc: 3/0,0 mode: PW/PW res: [0x2c002c0b5:0x180fa:0x0].0x0 bits 0x40/0x0 rrc: 23 type: IBT flags: 0x50200400000020 nid: 10.9.109.57@o2ib4 remote: 0xa69726dbefa840fd expref: 16 pid: 23713 timeout: 0 lvb_type: 0 Aug 28 09:55:21 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (904:80569s); client may timeout. req@ffff8f3b47892a00 x1634188304182960/t0(0) o101->95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b@10.9.109.57@o2ib4:28/0 lens 480/536 e 0 to 0 dl 1566930752 ref 1 fl Complete:/0/0 rc -107/-107 Aug 28 09:55:21 fir-md1-s1 kernel: LNet: Service thread pid 10502 completed after 81471.28s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 28 09:55:21 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 28 09:56:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 26a3bc1d-bdb0-bbb4-3006-c88ecc2f97cd (at 10.9.0.62@o2ib4) Aug 28 09:56:24 fir-md1-s1 kernel: Lustre: Skipped 812 previous similar messages Aug 28 09:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b3c51fa-e0b4-a52a-f8a8-37d700a7efb5 (at 10.9.0.64@o2ib4) reconnecting Aug 28 09:56:58 fir-md1-s1 kernel: Lustre: Skipped 806 previous similar messages Aug 28 10:00:40 fir-md1-s1 kernel: LustreError: 23589:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4516ed5c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f0d2cda0b40/0x5d9ee6e6581c767c lrc: 3/0,0 mode: PW/PW res: [0x2c002c0b5:0x180fa:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x50200400000020 nid: 10.9.109.57@o2ib4 remote: 0xa69726dbefa84166 expref: 12 pid: 23589 timeout: 0 lvb_type: 0 Aug 28 10:00:40 fir-md1-s1 kernel: LustreError: 23589:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Aug 28 10:00:40 fir-md1-s1 kernel: Lustre: 23589:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (848:80888s); client may timeout. req@ffff8f1bd58f6300 x1634188304192992/t0(0) o101->95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b@10.9.109.57@o2ib4:3/0 lens 480/536 e 0 to 0 dl 1566930752 ref 1 fl Complete:/0/0 rc -107/-107 Aug 28 10:00:40 fir-md1-s1 kernel: LNet: Service thread pid 97653 completed after 81695.67s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 28 10:00:40 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 28 10:00:40 fir-md1-s1 kernel: Lustre: 23589:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Aug 28 10:00:44 fir-md1-s1 kernel: LustreError: 21457:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f34f1f9e800 ns: mdt-fir-MDT0002_UUID lock: ffff8f15afc16780/0x5d9ee6e65832b191 lrc: 3/0,0 mode: PW/PW res: [0x2c002c0b5:0x180fa:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x50200400000020 nid: 10.9.109.58@o2ib4 remote: 0x20c40e230c28f717 expref: 4 pid: 21457 timeout: 0 lvb_type: 0 Aug 28 10:00:44 fir-md1-s1 kernel: LustreError: 21457:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Aug 28 10:00:44 fir-md1-s1 kernel: LNet: Service thread pid 21457 completed after 81648.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 28 10:00:44 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Aug 28 10:06:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f54ca57d-f21f-fc73-ad63-df7922956fa9 (at 10.9.102.40@o2ib4) Aug 28 10:06:24 fir-md1-s1 kernel: Lustre: Skipped 807 previous similar messages Aug 28 10:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3cf2eac3-6000-d1ad-26af-7aa417c35563 (at 10.9.103.25@o2ib4) reconnecting Aug 28 10:07:01 fir-md1-s1 kernel: Lustre: Skipped 818 previous similar messages Aug 28 10:09:43 fir-md1-s1 kernel: LustreError: 21676:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f34f1f9e800 ns: mdt-fir-MDT0002_UUID lock: ffff8f1f159c4ec0/0x5d9ee6e6583bb38c lrc: 3/0,0 mode: PW/PW res: [0x2c002c0b5:0x180fa:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x50200400000020 nid: 10.9.109.58@o2ib4 remote: 0x20c40e230c28f725 expref: 2 pid: 21676 timeout: 0 lvb_type: 0 Aug 28 10:09:43 fir-md1-s1 kernel: Lustre: 21676:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (50:82091s); client may timeout. req@ffff8f2789e82a00 x1636452790296256/t0(0) o101->eb3ddef3-17e8-8643-de57-b9fbaed7aaac@10.9.109.58@o2ib4:2/0 lens 480/536 e 0 to 0 dl 1566930092 ref 1 fl Complete:/0/0 rc -107/-107 Aug 28 10:09:43 fir-md1-s1 kernel: Lustre: 21676:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 28 10:09:43 fir-md1-s1 kernel: LNet: Service thread pid 21676 completed after 82141.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 28 10:16:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3b6d9c26-c17e-be49-3085-14268f72a0d1 (at 10.9.102.41@o2ib4) Aug 28 10:16:25 fir-md1-s1 kernel: Lustre: Skipped 812 previous similar messages Aug 28 10:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 683a48e7-a11e-d27d-92b8-e668e8ebb59d (at 10.9.102.47@o2ib4) reconnecting Aug 28 10:17:01 fir-md1-s1 kernel: Lustre: Skipped 804 previous similar messages Aug 28 10:26:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2b54c3ee-9728-0c5d-13b9-02c56cefb912 (at 10.8.28.7@o2ib6) Aug 28 10:26:26 fir-md1-s1 kernel: Lustre: Skipped 799 previous similar messages Aug 28 10:27:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c56b34d4-3ae6-19a5-6d19-cc66577d2e25 (at 10.9.102.17@o2ib4) reconnecting Aug 28 10:27:02 fir-md1-s1 kernel: Lustre: Skipped 804 previous similar messages Aug 28 10:27:24 fir-md1-s1 kernel: Lustre: 23713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f3e0fe5ce00 x1642617578792800/t0(0) o101->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:29/0 lens 584/3264 e 1 to 0 dl 1567013249 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 10:27:54 fir-md1-s1 kernel: Lustre: 23627:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f29c95cb000 x1642617579041632/t0(0) o101->705ae766-7496-3e3c-7a4b-0c1f4d988567@10.9.0.1@o2ib4:29/0 lens 584/3264 e 1 to 0 dl 1567013279 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 10:28:39 fir-md1-s1 kernel: LustreError: 21681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567013229, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f4136e41b00/0x5d9ee6e786713af9 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21681 timeout: 0 lvb_type: 0 Aug 28 10:29:09 fir-md1-s1 kernel: LustreError: 23681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567013259, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ace24c140/0x5d9ee6e78694ca10 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23681 timeout: 0 lvb_type: 0 Aug 28 10:30:29 fir-md1-s1 kernel: LNet: Service thread pid 21681 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 10:30:29 fir-md1-s1 kernel: Pid: 21681, comm: mdt03_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 10:30:29 fir-md1-s1 kernel: Call Trace: Aug 28 10:30:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 10:30:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 10:30:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 10:30:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 10:30:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 10:30:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 10:30:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 10:30:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 10:30:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 10:30:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567013429.21681 Aug 28 10:30:59 fir-md1-s1 kernel: LNet: Service thread pid 23681 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 10:30:59 fir-md1-s1 kernel: Pid: 23681, comm: mdt02_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 10:30:59 fir-md1-s1 kernel: Call Trace: Aug 28 10:30:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 10:30:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 10:30:59 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 10:30:59 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 10:30:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 10:30:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 10:30:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 10:30:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 10:30:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 10:30:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567013459.23681 Aug 28 10:36:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3cf0b37c-2835-0832-0bb4-dc0931773c92 (at 10.9.103.25@o2ib4) Aug 28 10:36:28 fir-md1-s1 kernel: Lustre: Skipped 842 previous similar messages Aug 28 10:37:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client e22e674f-6915-9577-9483-7e6e281a1562 (at 10.8.21.17@o2ib6) reconnecting Aug 28 10:37:02 fir-md1-s1 kernel: Lustre: Skipped 838 previous similar messages Aug 28 10:40:56 fir-md1-s1 kernel: LustreError: 21679:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8f4516ed5c00 ns: mdt-fir-MDT0002_UUID lock: ffff8f2e22671440/0x5d9ee6e6584477ad lrc: 3/0,0 mode: PW/PW res: [0x2c002c0b5:0x180fa:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x50200000000000 nid: 10.9.109.57@o2ib4 remote: 0xa69726dbefa841dd expref: 2 pid: 21679 timeout: 0 lvb_type: 0 Aug 28 10:40:56 fir-md1-s1 kernel: LustreError: 21679:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Aug 28 10:40:56 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (677:83304s); client may timeout. req@ffff8f318e366900 x1634188304209744/t0(0) o101->95cdb8fb-0e32-cb98-88bc-c0e9f3ec6a0b@10.9.109.57@o2ib4:15/0 lens 480/536 e 0 to 0 dl 1566930752 ref 1 fl Complete:/0/0 rc -107/-107 Aug 28 10:40:56 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Aug 28 10:40:56 fir-md1-s1 kernel: LNet: Service thread pid 21679 completed after 83981.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Aug 28 10:40:56 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Aug 28 10:44:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 634515dc-5f1c-c73b-9e38-157b83f4a562 (at 10.9.104.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4538d76000, cur 1567014277 expire 1567014127 last 1567014050 Aug 28 10:44:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 28 10:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 49defdc6-9c5a-7478-8696-e7769dc90bef (at 10.9.102.47@o2ib4) Aug 28 10:46:28 fir-md1-s1 kernel: Lustre: Skipped 836 previous similar messages Aug 28 10:47:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bdb11576-343d-220d-63d1-1ff1ea0ae4cb (at 10.8.28.7@o2ib6) reconnecting Aug 28 10:47:05 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Aug 28 10:47:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8f4535d36800, cur 1567014440 expire 1567014290 last 1567014213 Aug 28 10:47:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Aug 28 10:50:44 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Wed Aug 28 10:50:44 2019 Aug 28 10:52:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.2.21@o2ib6, removing former export from same NID Aug 28 10:52:34 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Aug 28 10:52:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.2.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Aug 28 10:53:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.13.1@o2ib6, removing former export from same NID Aug 28 10:56:24 fir-md1-s1 kernel: Lustre: 21679:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f32913eda00 x1642694394286880/t0(0) o101->7e8e3bc4-342c-6013-2c40-c6fa796bc32d@10.9.101.60@o2ib4:29/0 lens 584/3264 e 1 to 0 dl 1567014989 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 10:56:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to b6973ee9-fc9f-d2c6-1102-75dfdfeafb62 (at 10.9.0.64@o2ib4) Aug 28 10:56:29 fir-md1-s1 kernel: Lustre: Skipped 835 previous similar messages Aug 28 10:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bb520558-2d6f-90e6-2e79-a805392091ac (at 10.9.102.38@o2ib4) reconnecting Aug 28 10:57:05 fir-md1-s1 kernel: Lustre: Skipped 847 previous similar messages Aug 28 10:57:07 fir-md1-s1 kernel: Lustre: 24576:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-8), not sending early reply req@ffff8f1fb700da00 x1642694394291008/t0(0) o101->7e8e3bc4-342c-6013-2c40-c6fa796bc32d@10.9.101.60@o2ib4:12/0 lens 584/3264 e 1 to 0 dl 1567015032 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 10:57:40 fir-md1-s1 kernel: LustreError: 23603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567014969, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ab0598480/0x5d9ee6e78df1fd5f lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23603 timeout: 0 lvb_type: 0 Aug 28 10:58:10 fir-md1-s1 kernel: LustreError: 50447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567014999, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2920b8cec0/0x5d9ee6e78e1a0d54 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50447 timeout: 0 lvb_type: 0 Aug 28 10:59:30 fir-md1-s1 kernel: LNet: Service thread pid 23603 was inactive for 200.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 10:59:30 fir-md1-s1 kernel: Pid: 23603, comm: mdt02_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 10:59:30 fir-md1-s1 kernel: Call Trace: Aug 28 10:59:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 10:59:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 10:59:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 10:59:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 10:59:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 10:59:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 10:59:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 10:59:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 10:59:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 10:59:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567015170.23603 Aug 28 11:00:00 fir-md1-s1 kernel: LNet: Service thread pid 50447 was inactive for 200.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 11:00:00 fir-md1-s1 kernel: Pid: 50447, comm: mdt01_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 11:00:00 fir-md1-s1 kernel: Call Trace: Aug 28 11:00:00 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 11:00:00 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 11:00:00 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 11:00:00 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 11:00:00 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 11:00:00 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 11:00:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 11:00:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 11:00:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 11:00:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567015200.50447 Aug 28 11:02:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.13.1@o2ib6, removing former export from same NID Aug 28 11:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 21a10516-a548-5b71-4985-28948a2264c7 (at 10.8.21.17@o2ib6) Aug 28 11:06:29 fir-md1-s1 kernel: Lustre: Skipped 876 previous similar messages Aug 28 11:07:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37a46e0-1e70-6b27-1459-0c7be76fae27 (at 10.0.10.3@o2ib7) reconnecting Aug 28 11:07:05 fir-md1-s1 kernel: Lustre: Skipped 870 previous similar messages Aug 28 11:09:00 fir-md1-s1 kernel: Lustre: 50582:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8f2bb52ea400 x1641549646714896/t0(0) o101->a24ed69c-3a61-60fa-2e91-ac06e2c747e1@10.8.26.28@o2ib6:5/0 lens 584/3264 e 0 to 0 dl 1567015745 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 11:10:05 fir-md1-s1 kernel: LustreError: 23601:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567015715, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1943feda00/0x5d9ee6e7920116b2 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 38 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23601 timeout: 0 lvb_type: 0 Aug 28 11:11:55 fir-md1-s1 kernel: LNet: Service thread pid 23601 was inactive for 200.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 11:11:55 fir-md1-s1 kernel: Pid: 23601, comm: mdt02_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 11:11:55 fir-md1-s1 kernel: Call Trace: Aug 28 11:11:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 11:11:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 11:11:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 11:11:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 11:11:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 11:11:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 11:11:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 11:11:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 11:11:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 11:11:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567015915.23601 Aug 28 11:11:56 fir-md1-s1 kernel: Lustre: 23561:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8f0eaa486c00 x1643052947772848/t0(0) o101->f37a46e0-1e70-6b27-1459-0c7be76fae27@10.0.10.3@o2ib7:1/0 lens 584/3264 e 1 to 0 dl 1567015921 ref 2 fl Interpret:/0/0 rc 0/0 Aug 28 11:13:11 fir-md1-s1 kernel: LustreError: 23653:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567015901, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f14e3261440/0x5d9ee6e793a06640 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 39 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23653 timeout: 0 lvb_type: 0 Aug 28 11:15:01 fir-md1-s1 kernel: LNet: Service thread pid 23653 was inactive for 200.64s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Aug 28 11:15:01 fir-md1-s1 kernel: Pid: 23653, comm: mdt00_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Aug 28 11:15:01 fir-md1-s1 kernel: Call Trace: Aug 28 11:15:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Aug 28 11:15:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Aug 28 11:15:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Aug 28 11:15:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Aug 28 11:15:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Aug 28 11:15:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Aug 28 11:15:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Aug 28 11:15:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Aug 28 11:15:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Aug 28 11:15:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Aug 28 11:15:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Aug 28 11:15:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1567016102.23653 Aug 28 11:16:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to cbaa9137-615d-1663-c99f-cfd652da47fd (at 10.8.27.24@o2ib6) Aug 28 11:16:29 fir-md1-s1 kernel: Lustre: Skipped 865 previous similar messages Aug 28 11:17:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 397e53ea-489f-22f1-95c4-27ab82ab5709 (at 10.9.102.43@o2ib4) reconnecting Aug 28 11:17:08 fir-md1-s1 kernel: Lustre: Skipped 864 previous similar messages